User Tools

Site Tools


cluster:134

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:134 [2014/08/19 15:41]
hmeij [High Throughput]
cluster:134 [2014/08/22 09:05] (current)
hmeij [MPI]
Line 101: Line 101:
  
 [[https://computing.llnl.gov/linux/slurm/high_throughput.html]] [[https://computing.llnl.gov/linux/slurm/high_throughput.html]]
 +
 +Vanilla out of the box with these changes
  
   * MaxJobCount=120000   * MaxJobCount=120000
Line 121: Line 123:
 After fixing that. Hmmm. After fixing that. Hmmm.
  
-^NrJobs^N^hh:mm^N^hh:mm^+^NrJobs^N^hh:mm^
 | 1,000|8|00:02|  | 1,000|8|00:02| 
 |10,000|8|00:22|  |10,000|8|00:22| 
Line 128: Line 130:
  
  
-Debug Level is 3. Falling back to proctrack/pgid and set debug to level 1. Also setting SchedulerType=sched/builtin (removing the backfill).+Debug Level is 3 above. Falling back to proctrack/pgid while setting debug to level 1. Also setting SchedulerType=sched/builtin (removing the backfill). This is throughput allright, just 8 KVM nodes handling the jobs.
  
-^NrJobs^N^hh:mm^N^hh:mm+^NrJobs^N^hh:mm:ss
-| 1,000|8|00:??|  +| 1,000|8|00:00:34|  
-|10,000|8|00:??|  +|10,000|8|00:05:57|  
-|25,000|8|00:??|  +|25,000|8|00:15:07|  
-|50,000|8|00:??|+|50,000|8|00:29:55| 
 +|75,000|8|00:44:15| 
 +|100,000|8|00:58:16|
  
- (also added a proplog/epilog script to my submit job script which will created /localscratch/$SLURM_JOB_ID, echo the date into file foo, then cat foo to standard out and finish with removing the scratch dir). These prolog/epilog actions needs to be done by slurmd but so far it errors for me.+ Next will add a proplog/epilog script to my submit job script which will create  
 +/localscratch/$SLURM_JOB_ID, echo the date into file foo, then cat foo to standard out and finish with removing the scratch dir. These prolog/epilog actions needs to be done by slurmd but so far it errors for me.  Does slow things down a bit. Same conditions as above. 
 + 
 +<code> 
 +#!/bin/bash 
 +/share/apps/lsf/slurm_prolog.pl 
 + 
 +#SBATCH --job-name="NUMBER" 
 +#SBATCH --output="tmp/outNUMBER" 
 +#SBATCH --begin=10:00:00 
 + 
 +# unique job scratch dir 
 +export MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID 
 +cd $MYLOCALSCRATCH 
 +pwd 
 + 
 +echo "$SLURMD_NODENAME JOB_PID=$SLURM_JOB_ID" >> foo 
 +date  >> foo 
 +cat foo  
 + 
 +/share/apps/lsf/slurm_epilog.pl 
 +</code> 
 + 
 + 
 +^NrJobs^N^hh:mm:ss^ 
 +| 1,000|8|00:05:00|  
 +| 5,000|8|00:23:43|  
 +|10,000|8|00:47:12|  
 +|25,000|8|00:58:01| 
 + 
 + 
 +==== MPI ==== 
 + 
 +With ''sbatch'' there is no need for a wrapper script, slurm figures it all out. 
 + 
 +<code> 
 + 
 +#!/bin/bash 
 +#/share/apps/lsf/slurm_prolog.pl 
 + 
 +#SBATCH --job-name="MPI" 
 +#SBATCH --ntasks=8 
 +#SBATCH --begin=now 
 + 
 +# unique job scratch dir 
 +#export MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID 
 +#cd $MYLOCALSCRATCH 
 + 
 +echo "$SLURMD_NODENAME JOB_PID=$SLURM_JOB_ID" 
 + 
 +rm -rf err out logfile mdout restrt mdinfo 
 + 
 +export PATH=/share/apps/openmpi/1.2+intel-9/bin:$PATH 
 +export LD_LIBRARY_PATH=/share/apps/openmpi/1.2+intel-9/lib:$LD_LIBRARY_PATH 
 +which mpirun 
 + 
 +mpirun /share/apps/amber/9+openmpi-1.2+intel-9/exe/pmemd -O \ 
 +-i inp/mini.in -p 1g6r.cd.parm -c 1g6r.cd.randions.crd.1 \ 
 +-ref 1g6r.cd.randions.crd.1 
 + 
 +#/share/apps/lsf/slurm_epilog.pl 
 + 
 +</code> 
 + 
 +When submitted we see 
 + 
 +<code> 
 + 
 +             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
 +            902246      test      MPI    hmeij  R       0:05      8 v[1-8] 
 + 
 +</code> 
 + 
 +Dumping the environment we observe some key parameters 
 + 
 +<code> 
 + 
 +SLURM_NODELIST=v[1-8] 
 +SLURM_JOB_NAME=MPI 
 +SLURMD_NODENAME=v1 
 +SLURM_NNODES=8 
 +SLURM_NTASKS=8 
 +SLURM_TASKS_PER_NODE=1(x8) 
 +SLURM_NPROCS=8 
 +SLURM_CPUS_ON_NODE=1 
 +SLURM_JOB_NODELIST=v[1-8] 
 +SLURM_JOB_CPUS_PER_NODE=1(x8) 
 +SLURM_JOB_NUM_NODES=8 
 + 
 +</code> 
 + 
 +And in the slurmjob.log file 
 + 
 +<code> 
 + 
 +JobId=902245 UserId=hmeij(8216) GroupId=its(623) \ 
 +Name=MPI JobState=COMPLETED Partition=test TimeLimit=UNLIMITED \ 
 +StartTime=2014-08-21T15:55:06 EndTime=2014-08-21T15:57:04 \ 
 +NodeList=v[1-8] NodeCnt=8 ProcCnt=8 WorkDir=/home/hmeij/1g6r/cd 
 + 
 +</code>
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/134.1408477295.txt.gz · Last modified: 2014/08/19 15:41 by hmeij