User Tools

Site Tools


cluster:134

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:134 [2014/08/20 13:54]
hmeij [High Throughput]
cluster:134 [2014/08/22 13:05] (current)
hmeij [MPI]
Line 101: Line 101:
  
 [[https://computing.llnl.gov/linux/slurm/high_throughput.html]] [[https://computing.llnl.gov/linux/slurm/high_throughput.html]]
 +
 +Vanilla out of the box with these changes
  
   * MaxJobCount=120000   * MaxJobCount=120000
Line 140: Line 142:
  Next I will add a proplog/epilog script to my submit job script which will create   Next I will add a proplog/epilog script to my submit job script which will create 
 /localscratch/$SLURM_JOB_ID, echo the date into file foo, then cat foo to standard out and finish with removing the scratch dir. These prolog/epilog actions needs to be done by slurmd but so far it errors for me.  Does slow things down a bit. Same conditions as above. /localscratch/$SLURM_JOB_ID, echo the date into file foo, then cat foo to standard out and finish with removing the scratch dir. These prolog/epilog actions needs to be done by slurmd but so far it errors for me.  Does slow things down a bit. Same conditions as above.
 +
 +<code>
 +#!/bin/bash
 +/share/apps/lsf/slurm_prolog.pl
 +
 +#SBATCH --job-name="NUMBER"
 +#SBATCH --output="tmp/outNUMBER"
 +#SBATCH --begin=10:00:00
 +
 +# unique job scratch dir
 +export MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 +cd $MYLOCALSCRATCH
 +pwd
 +
 +echo "$SLURMD_NODENAME JOB_PID=$SLURM_JOB_ID" >> foo
 +date  >> foo
 +cat foo 
 +
 +/share/apps/lsf/slurm_epilog.pl
 +</code>
 +
  
 ^NrJobs^N^hh:mm:ss^ ^NrJobs^N^hh:mm:ss^
-| 1,000|8|00:??:??|  +| 1,000|8|00:05:00|  
-|10,000|8|00:??:??|  +5,000|8|00:23:43|  
-|25,000|8|00:??:??|  +|10,000|8|00:47:12|  
-|50,000|8|00:??:??+|25,000|8|00:58:01
-|75,000|8|00:??:??| + 
-|100,000|8|00:??:??|+ 
 +==== MPI ==== 
 + 
 +With ''sbatch'' there is no need for a wrapper scriptslurm figures it all out. 
 + 
 +<code> 
 + 
 +#!/bin/bash 
 +#/share/apps/lsf/slurm_prolog.pl 
 + 
 +#SBATCH --job-name="MPI" 
 +#SBATCH --ntasks=8 
 +#SBATCH --begin=now 
 + 
 +# unique job scratch dir 
 +#export MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID 
 +#cd $MYLOCALSCRATCH 
 + 
 +echo "$SLURMD_NODENAME JOB_PID=$SLURM_JOB_ID" 
 + 
 +rm -rf err out logfile mdout restrt mdinfo 
 + 
 +export PATH=/share/apps/openmpi/1.2+intel-9/bin:$PATH 
 +export LD_LIBRARY_PATH=/share/apps/openmpi/1.2+intel-9/lib:$LD_LIBRARY_PATH 
 +which mpirun 
 + 
 +mpirun /share/apps/amber/9+openmpi-1.2+intel-9/exe/pmemd -O \ 
 +-i inp/mini.in -p 1g6r.cd.parm -c 1g6r.cd.randions.crd.1 \ 
 +-ref 1g6r.cd.randions.crd.1 
 + 
 +#/share/apps/lsf/slurm_epilog.pl 
 + 
 +</code> 
 + 
 +When submitted we see 
 + 
 +<code> 
 + 
 +             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
 +            902246      test      MPI    hmeij  R       0:05      v[1-8] 
 + 
 +</code> 
 + 
 +Dumping the environment we observe some key parameters 
 + 
 +<code> 
 + 
 +SLURM_NODELIST=v[1-8] 
 +SLURM_JOB_NAME=MPI 
 +SLURMD_NODENAME=v1 
 +SLURM_NNODES=8 
 +SLURM_NTASKS=8 
 +SLURM_TASKS_PER_NODE=1(x8) 
 +SLURM_NPROCS=8 
 +SLURM_CPUS_ON_NODE=1 
 +SLURM_JOB_NODELIST=v[1-8] 
 +SLURM_JOB_CPUS_PER_NODE=1(x8) 
 +SLURM_JOB_NUM_NODES=8 
 + 
 +</code> 
 + 
 +And in the slurmjob.log file 
 + 
 +<code> 
 + 
 +JobId=902245 UserId=hmeij(8216) GroupId=its(623) \ 
 +Name=MPI JobState=COMPLETED Partition=test TimeLimit=UNLIMITED \ 
 +StartTime=2014-08-21T15:55:06 EndTime=2014-08-21T15:57:04 \ 
 +NodeList=v[1-8] NodeCnt=8 ProcCnt=8 WorkDir=/home/hmeij/1g6r/cd 
 + 
 +</code>
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/134.1408542874.txt.gz · Last modified: 2014/08/20 13:54 by hmeij