Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
cluster:218 [2022/06/30 13:54] hmeij07 [Testing!] |
cluster:218 [2022/08/09 15:25] hmeij07 [Resources] |
* August 2022 is designated **migration** period | * August 2022 is designated **migration** period |
* Queues ''hp12'' and ''mwgpu'' (centos6) will be serviced by Openlava, not Slurm | * Queues ''hp12'' and ''mwgpu'' (centos6) will be serviced by Openlava, not Slurm |
| |
| |
| ==== Quick Start Slurm Guide ==== |
| |
| Jump to the **Rocky8/CentOs7 script templates** listed in the menu of this page, top right. |
| |
| There is also detailed information on Amber20/Amber22 on this page with script examples. |
| |
| * [[cluster:214|Tada]] new head node |
| |
==== Basic Commands ==== | ==== Basic Commands ==== |
# sorta like bhosts -l | # sorta like bhosts -l |
scontrol show node n78 | scontrol show node n78 |
| |
| # sorta like bstop/bresume |
| scontrol suspend job 1000001 |
| scontrol resume job 1000001 |
| |
# sorta like bhist -l | # sorta like bhist -l |
| |
You must request **resources**, that is for example number of cpu cores or which gpu model to use. ** If you do not request resources, Slurm will assume you need all the node's resources** and thus prevent other jobs from running on that node. | You must request **resources**, that is for example number of cpu cores or which gpu model to use. ** If you do not request resources, Slurm will assume you need all the node's resources** and thus prevent other jobs from running on that node. |
| |
| Details |
| |
| * https://slurm.schedmd.com/cons_res_share.html |
| |
Some common examples are: | Some common examples are: |
#SBATCH -n 8 # tasks=S*C*T | #SBATCH -n 8 # tasks=S*C*T |
#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core | #SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core |
| #SBATCH --mem=250 # needed to override oversubscribe |
| #SBATCH --ntasks-per-node=1 # needed to override oversubscribe |
| #SBATCH --cpus-per-task=1 # needed to override oversubscribe |
| |
| |
GPU control | GPU control |
#SBATCH --cpus-per-gpu=1 | #SBATCH --cpus-per-gpu=1 # needed to override oversubscribe |
#SBATCH --mem-per-gpu=7168 | #SBATCH --mem-per-gpu=7168 # needed to override oversubscribe |
#SBATCH --gres=gpu:geforce_gtx_1080_ti:1 # n[78], amber128 | #SBATCH --gres=gpu:geforce_gtx_1080_ti:1 # n[78], amber128 |
#SBATCH --gres=gpu:geforce_rtx_2080_s:1 # n[79-90], exx96 | #SBATCH --gres=gpu:geforce_rtx_2080_s:1 # n[79-90], exx96 |
| |
| |
* ''/zfshomes/hmeij/slurm/run.rocky'' | * ''/zfshomes/hmeij/slurm/run.rocky'' for tinymem, mw128, amber128, test queues |
| |
<code> | <code> |
cd $MYLOCALSCRATCH | cd $MYLOCALSCRATCH |
| |
### AMBER20 | ### AMBER20 works via slurm's imaged nodes, test and amber128 queues |
#source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh | #source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh |
# OR # | # OR # |
Note also that we're running mwgpu's K20 cuda version 9.2 on exx96 queue (default cuda version 10.2). Not proper but it works. Hence this script will run on both queues. Oh, now I remember, it is that amber16 was compiled with cuda 9.2 drivers which are supported in cuda 10.x but not in cuda 11.x. So Amber 16, if needed, would need to be compiled in Rocky8 environment (and may work like amber20 module). | Note also that we're running mwgpu's K20 cuda version 9.2 on exx96 queue (default cuda version 10.2). Not proper but it works. Hence this script will run on both queues. Oh, now I remember, it is that amber16 was compiled with cuda 9.2 drivers which are supported in cuda 10.x but not in cuda 11.x. So Amber 16, if needed, would need to be compiled in Rocky8 environment (and may work like amber20 module). |
| |
* ''/zfshomes/hmeij/slurm/run.centos'' | * ''/zfshomes/hmeij/slurm/run.centos'' for mwgpu, exx96 queues |
| |
<code> | <code> |
| |
| |
###source /usr/local/amber16/amber.sh | |
source /usr/local/amber20/amber.sh | ###source /usr/local/amber16/amber.sh # works via slurm's mwgpu |
| source /usr/local/amber20/amber.sh # works via slurm's exx96 |
# stage the data | # stage the data |
cp -r ~/sharptail/* . | cp -r ~/sharptail/* . |