This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:218 [2022/07/18 14:11] hmeij07 [Basic Commands] |
cluster:218 [2023/09/27 08:51] hmeij07 [Resources] |
||
---|---|---|---|
Line 28: | Line 28: | ||
* August 2022 is designated **migration** period | * August 2022 is designated **migration** period | ||
* Queues '' | * Queues '' | ||
+ | |||
+ | |||
+ | ==== Quick Start Slurm Guide ==== | ||
+ | |||
+ | Jump to the **Rocky8/ | ||
+ | |||
+ | There is also detailed information on Amber20/ | ||
+ | |||
+ | * [[cluster: | ||
==== Basic Commands ==== | ==== Basic Commands ==== | ||
Line 48: | Line 57: | ||
| | ||
- | # sorta like stop/resume | + | # sorta like bstop/bresume |
scontrol suspend job 1000001 | scontrol suspend job 1000001 | ||
scontrol resume job 1000001 | scontrol resume job 1000001 | ||
Line 76: | Line 85: | ||
You must request **resources**, | You must request **resources**, | ||
+ | |||
+ | Details | ||
+ | |||
+ | * https:// | ||
Some common examples are: | Some common examples are: | ||
Line 87: | Line 100: | ||
#SBATCH -n 8 # tasks=S*C*T | #SBATCH -n 8 # tasks=S*C*T | ||
#SBATCH -B 2:4:1 # S: | #SBATCH -B 2:4:1 # S: | ||
+ | #SBATCH --mem=250 | ||
+ | #SBATCH --ntasks-per-node=1 # perhaps needed to override oversubscribe | ||
+ | #SBATCH --cpus-per-task=1 | ||
+ | |||
GPU control | GPU control | ||
- | #SBATCH --cpus-per-gpu=1 | + | #SBATCH --cpus-per-gpu=1 |
- | #SBATCH --mem-per-gpu=7168 | + | #SBATCH --mem-per-gpu=7168 |
#SBATCH --gres=gpu: | #SBATCH --gres=gpu: | ||
#SBATCH --gres=gpu: | #SBATCH --gres=gpu: | ||
Line 104: | Line 121: | ||
</ | </ | ||
+ | |||
+ | ** Pending Jobs ** | ||
+ | |||
+ | I keep having to inform users that with -n 1 and -cpu 1 your can still go in pending state because user forgot to reserve memory so silly slurm assumes your jobs needs all the node's memory. Here is my template then | ||
+ | |||
+ | < | ||
+ | |||
+ | FirstName, your jobs are pending because you did not request memory and if not then slurm assumes you need all memory, silly. Command " | ||
+ | |||
+ | JobId=1062052 JobName=3a_avgHbond_CPU | ||
+ | | ||
+ | | ||
+ | |||
+ | I looked (command "ssh n?? top -u username -b -n 1", look for the VIRT value) and you need less than 1G per job so with --mem=1024 and n=1 and cpu=1 you should be able to load 48 jobs onto n100. Consult output of command "sinfo -lN" | ||
+ | |||
+ | </ | ||
+ | |||
==== MPI ==== | ==== MPI ==== | ||
Line 492: | Line 526: | ||
#SBATCH --nodelist=n88 | #SBATCH --nodelist=n88 | ||
+ | # may or may not be needed, centos7 login env | ||
+ | source $HOME/ | ||
+ | which ifort # should be the parallel studio 2016 version | ||
# unique job scratch dirs | # unique job scratch dirs |