This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
cluster:218 [2022/07/02 15:36] hmeij07 [Rocky8 Slurm Template] |
cluster:218 [2023/09/27 08:52] hmeij07 [Resources] |
||
---|---|---|---|
Line 28: | Line 28: | ||
* August 2022 is designated **migration** period | * August 2022 is designated **migration** period | ||
* Queues '' | * Queues '' | ||
+ | |||
+ | |||
+ | ==== Quick Start Slurm Guide ==== | ||
+ | |||
+ | Jump to the **Rocky8/ | ||
+ | |||
+ | There is also detailed information on Amber20/ | ||
+ | |||
+ | * [[cluster: | ||
==== Basic Commands ==== | ==== Basic Commands ==== | ||
Line 47: | Line 56: | ||
# sorta like bhosts -l | # sorta like bhosts -l | ||
| | ||
+ | |||
+ | # sorta like bstop/ | ||
+ | scontrol suspend job 1000001 | ||
+ | scontrol resume job 1000001 | ||
# sorta like bhist -l | # sorta like bhist -l | ||
Line 72: | Line 85: | ||
You must request **resources**, | You must request **resources**, | ||
+ | |||
+ | Details | ||
+ | |||
+ | * https:// | ||
Some common examples are: | Some common examples are: | ||
Line 83: | Line 100: | ||
#SBATCH -n 8 # tasks=S*C*T | #SBATCH -n 8 # tasks=S*C*T | ||
#SBATCH -B 2:4:1 # S: | #SBATCH -B 2:4:1 # S: | ||
+ | #SBATCH --mem=250 | ||
+ | #SBATCH --ntasks-per-node=1 # perhaps needed to override oversubscribe | ||
+ | #SBATCH --cpus-per-task=1 | ||
+ | |||
GPU control | GPU control | ||
- | #SBATCH --cpus-per-gpu=1 | + | #SBATCH --cpus-per-gpu=1 |
- | #SBATCH --mem-per-gpu=7168 | + | #SBATCH --mem-per-gpu=7168 |
#SBATCH --gres=gpu: | #SBATCH --gres=gpu: | ||
#SBATCH --gres=gpu: | #SBATCH --gres=gpu: | ||
Line 100: | Line 121: | ||
</ | </ | ||
+ | |||
+ | ** Pending Jobs ** | ||
+ | |||
+ | I keep having to inform users that with -n 1 and -cpu 1 your can still go in pending state because user forgot to reserve memory so silly slurm assumes your jobs needs all the node's memory. Here is my template then | ||
+ | |||
+ | < | ||
+ | |||
+ | FirstName, your jobs are pending because you did not request memory | ||
+ | and if not then slurm assumes you need all memory, silly. | ||
+ | Command " | ||
+ | |||
+ | JobId=1062052 JobName=3a_avgHbond_CPU | ||
+ | | ||
+ | | ||
+ | |||
+ | I looked (command "ssh n?? top -u username -b -n 1", look for the VIRT value) | ||
+ | and you need less than 1G per job so with --mem=1024 and n=1 and cpu=1 | ||
+ | you should be able to load 48 jobs onto n100. | ||
+ | Consult output of command "sinfo -lN" | ||
+ | |||
+ | </ | ||
+ | |||
==== MPI ==== | ==== MPI ==== | ||
Line 488: | Line 531: | ||
#SBATCH --nodelist=n88 | #SBATCH --nodelist=n88 | ||
+ | # may or may not be needed, centos7 login env | ||
+ | source $HOME/ | ||
+ | which ifort # should be the parallel studio 2016 version | ||
# unique job scratch dirs | # unique job scratch dirs |