This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:134 [2014/08/19 18:05] hmeij [High Throughput] |
cluster:134 [2014/08/22 13:05] (current) hmeij [MPI] |
||
---|---|---|---|
Line 101: | Line 101: | ||
[[https:// | [[https:// | ||
+ | |||
+ | Vanilla out of the box with these changes | ||
* MaxJobCount=120000 | * MaxJobCount=120000 | ||
Line 119: | Line 121: | ||
</ | </ | ||
- | After fixing that. | + | After fixing that. Hmmm. |
- | ^NrJobs^N^hh:mm^N^hh:mm^ | + | ^NrJobs^N^hh: |
- | | 1,000|8|00:??| | + | | 1,000|8|00:02| |
- | |10, | + | |10, |
- | |25,000|8|00:??| | + | |15,000|8|00:31| |
- | |50,000|8|00:??| | + | |20,000|8|00:41| |
- | Debug Level is 3. Maybe go to 1. | + | Debug Level is 3 above. Falling back to proctrack/ |
- | (I also added a proplog/ | + | ^NrJobs^N^hh: |
+ | | 1, | ||
+ | |10, | ||
+ | |25, | ||
+ | |50, | ||
+ | |75, | ||
+ | |100, | ||
+ | |||
+ | | ||
+ | / | ||
+ | |||
+ | < | ||
+ | # | ||
+ | / | ||
+ | |||
+ | #SBATCH --job-name=" | ||
+ | #SBATCH --output=" | ||
+ | #SBATCH --begin=10: | ||
+ | |||
+ | # unique job scratch dir | ||
+ | export MYLOCALSCRATCH=/ | ||
+ | cd $MYLOCALSCRATCH | ||
+ | pwd | ||
+ | |||
+ | echo " | ||
+ | date >> foo | ||
+ | cat foo | ||
+ | |||
+ | / | ||
+ | </ | ||
+ | |||
+ | |||
+ | ^NrJobs^N^hh: | ||
+ | | 1, | ||
+ | | 5, | ||
+ | |10, | ||
+ | |25, | ||
+ | |||
+ | |||
+ | ==== MPI ==== | ||
+ | |||
+ | With '' | ||
+ | |||
+ | < | ||
+ | |||
+ | # | ||
+ | #/ | ||
+ | |||
+ | #SBATCH --job-name=" | ||
+ | #SBATCH --ntasks=8 | ||
+ | #SBATCH --begin=now | ||
+ | |||
+ | # unique job scratch dir | ||
+ | #export MYLOCALSCRATCH=/ | ||
+ | #cd $MYLOCALSCRATCH | ||
+ | |||
+ | echo " | ||
+ | |||
+ | rm -rf err out logfile mdout restrt mdinfo | ||
+ | |||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | which mpirun | ||
+ | |||
+ | mpirun / | ||
+ | -i inp/mini.in -p 1g6r.cd.parm -c 1g6r.cd.randions.crd.1 \ | ||
+ | -ref 1g6r.cd.randions.crd.1 | ||
+ | |||
+ | #/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | When submitted we see | ||
+ | |||
+ | < | ||
+ | |||
+ | JOBID PARTITION | ||
+ | 902246 | ||
+ | |||
+ | </ | ||
+ | |||
+ | Dumping the environment we observe some key parameters | ||
+ | |||
+ | < | ||
+ | |||
+ | SLURM_NODELIST=v[1-8] | ||
+ | SLURM_JOB_NAME=MPI | ||
+ | SLURMD_NODENAME=v1 | ||
+ | SLURM_NNODES=8 | ||
+ | SLURM_NTASKS=8 | ||
+ | SLURM_TASKS_PER_NODE=1(x8) | ||
+ | SLURM_NPROCS=8 | ||
+ | SLURM_CPUS_ON_NODE=1 | ||
+ | SLURM_JOB_NODELIST=v[1-8] | ||
+ | SLURM_JOB_CPUS_PER_NODE=1(x8) | ||
+ | SLURM_JOB_NUM_NODES=8 | ||
+ | |||
+ | </ | ||
+ | |||
+ | And in the slurmjob.log file | ||
+ | |||
+ | < | ||
+ | |||
+ | JobId=902245 UserId=hmeij(8216) GroupId=its(623) \ | ||
+ | Name=MPI JobState=COMPLETED Partition=test TimeLimit=UNLIMITED \ | ||
+ | StartTime=2014-08-21T15: | ||
+ | NodeList=v[1-8] NodeCnt=8 ProcCnt=8 WorkDir=/ | ||
+ | |||
+ | </ | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |