User Tools

Site Tools


cluster:134

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:134 [2014/08/19 15:49]
hmeij [High Throughput]
cluster:134 [2014/08/22 13:05]
hmeij [MPI]
Line 101: Line 101:
  
 [[https://computing.llnl.gov/linux/slurm/high_throughput.html]] [[https://computing.llnl.gov/linux/slurm/high_throughput.html]]
 +
 +Vanilla out of the box with these changes
  
   * MaxJobCount=120000   * MaxJobCount=120000
Line 119: Line 121:
 </code> </code>
  
-After fixing that. (I also added a proplog/epilog script to my submit job script which will created /localscratch/$SLURM_JOB_ID, echo the date into file foo, then cat foo to standard out). These prolog/epilog actions needs to be done by slurmd but so far it errors for me.+After fixing that. Hmmm.
  
 +^NrJobs^N^hh:mm^
 +| 1,000|8|00:02| 
 +|10,000|8|00:22| 
 +|15,000|8|00:31| 
 +|20,000|8|00:41| 
  
-^NrJobs^N^hh:mm^N^hh:mm^ 
-|50,000|8|00:?? 
  
 +Debug Level is 3 above. Falling back to proctrack/pgid while setting debug to level 1. Also setting SchedulerType=sched/builtin (removing the backfill). This is throughput allright, just 8 KVM nodes handling the jobs.
 +
 +^NrJobs^N^hh:mm:ss^
 +| 1,000|8|00:00:34| 
 +|10,000|8|00:05:57| 
 +|25,000|8|00:15:07| 
 +|50,000|8|00:29:55|
 +|75,000|8|00:44:15|
 +|100,000|8|00:58:16|
 +
 + Next I will add a proplog/epilog script to my submit job script which will create 
 +/localscratch/$SLURM_JOB_ID, echo the date into file foo, then cat foo to standard out and finish with removing the scratch dir. These prolog/epilog actions needs to be done by slurmd but so far it errors for me.  Does slow things down a bit. Same conditions as above.
 +
 +<code>
 +#!/bin/bash
 +/share/apps/lsf/slurm_prolog.pl
 +
 +#SBATCH --job-name="NUMBER"
 +#SBATCH --output="tmp/outNUMBER"
 +#SBATCH --begin=10:00:00
 +
 +# unique job scratch dir
 +export MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 +cd $MYLOCALSCRATCH
 +pwd
 +
 +echo "$SLURMD_NODENAME JOB_PID=$SLURM_JOB_ID" >> foo
 +date  >> foo
 +cat foo 
 +
 +/share/apps/lsf/slurm_epilog.pl
 +</code>
 +
 +
 +^NrJobs^N^hh:mm:ss^
 +| 1,000|8|00:05:00| 
 +| 5,000|8|00:23:43| 
 +|10,000|8|00:47:12| 
 +|25,000|8|00:58:01|
 +
 +
 +==== MPI ====
 +
 +With ''sbatch'' there is no need for a wrapper script, slurm figures it all out.
 +
 +<code>
 +
 +#!/bin/bash
 +#/share/apps/lsf/slurm_prolog.pl
 +
 +#SBATCH --job-name="MPI"
 +#SBATCH --ntasks=8
 +#SBATCH --begin=now
 +
 +# unique job scratch dir
 +#export MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 +#cd $MYLOCALSCRATCH
 +
 +echo "$SLURMD_NODENAME JOB_PID=$SLURM_JOB_ID"
 +
 +rm -rf err out logfile mdout restrt mdinfo
 +
 +export PATH=/share/apps/openmpi/1.2+intel-9/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/openmpi/1.2+intel-9/lib:$LD_LIBRARY_PATH
 +which mpirun
 +
 +mpirun /share/apps/amber/9+openmpi-1.2+intel-9/exe/pmemd -O \
 +-i inp/mini.in -p 1g6r.cd.parm -c 1g6r.cd.randions.crd.1 \
 +-ref 1g6r.cd.randions.crd.1
 +
 +#/share/apps/lsf/slurm_epilog.pl
 +
 +</code>
 +
 +When submitted we see
 +
 +<code>
 +
 +             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 +            902246      test      MPI    hmeij  R       0:05      8 v[1-8]
 +
 +</code>
 +
 +Dumping the environment we observe some key parameters
 +
 +<code>
 +
 +SLURM_NODELIST=v[1-8]
 +SLURM_JOB_NAME=MPI
 +SLURMD_NODENAME=v1
 +SLURM_NNODES=8
 +SLURM_NTASKS=8
 +SLURM_TASKS_PER_NODE=1(x8)
 +SLURM_NPROCS=8
 +SLURM_CPUS_ON_NODE=1
 +SLURM_JOB_NODELIST=v[1-8]
 +SLURM_JOB_CPUS_PER_NODE=1(x8)
 +SLURM_JOB_NUM_NODES=8
 +
 +</code>
 +
 +And in the slurmjob.log file
 +
 +<code>
 +
 +JobId=902245 UserId=hmeij(8216) GroupId=its(623) \
 +Name=MPI JobState=COMPLETED Partition=test TimeLimit=UNLIMITED \
 +StartTime=2014-08-21T15:55:06 EndTime=2014-08-21T15:57:04 \
 +NodeList=v[1-8] NodeCnt=8 ProcCnt=8 WorkDir=/home/hmeij/1g6r/cd
 +
 +</code>
  
-Debug Level is 3. Maybe go to 1. 
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/134.txt ยท Last modified: 2014/08/22 13:05 by hmeij