User Tools

Site Tools


cluster:88

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:88 [2010/08/10 20:14]
hmeij
cluster:88 [2010/08/11 14:30]
hmeij
Line 179: Line 179:
  
 ===== Step 5 ===== ===== Step 5 =====
 +
 +Fun step.
  
   * make a backup copy of /etc/lava/conf/lsbatch/lava/configdir/lsb.queues   * make a backup copy of /etc/lava/conf/lsbatch/lava/configdir/lsb.queues
   * edit file, delete everything but queue 'normal' definition   * edit file, delete everything but queue 'normal' definition
-  * (if you rename queue normal+  * (if you rename queue normal you also need to edit lsb.params and define default queue) 
 +  * remove most queue definitions and set the following 
 +    * QJOB_LIMIT = 4 (assuming you have 2 nodes in cluster, 6 if you have 3, iow #nodes * #cores) 
 +    * UJOB_LIMIT = 1000 (user like to write scripts and submit jobs, this protects from runaway scripts) 
 +    * INTERACTIVE = no (only batch is allowed) 
 +    * EXCLUSIVE = Y (allow the bsub -x flag) 
 +    * PRE_EXEC = /home/apps/lava/pre_exec  (these two will create/remove the scratch dirs) 
 +    * POST_EXEC = /home/apps/lava/post_exec 
 +  * make the directories /home/apps (for compiled software) 
 +  * make the directory /home/lava and /home/sanscratch 
 +  * be sure /localscratch and /home/sanscratch have permissions like /tmp on all blades 
 +  * create the pre/post exec files (post does an rm -rf against the created directories) 
 +  * for example: 
 +<code> 
 +#!/bin/bash 
 +if ["X$LSB_JOBID" != "X" ]; then 
 +    mkdir -p /home/sanscratch/$LSB_JOBID /localscratch/$LSB_JOBID 
 +    sleep 5; exit 0 
 +else 
 +    echo "LSB_JOBID NOT SET!" 
 +    exit 111 
 +fi 
 +</code> 
 + 
 +  * 'badmin reconfig' 
 +  * 'bqueues' should now show new configuration 
 + 
 +Now we're ready to submit a serial jobs.  As a non-privilege user create two files: 
 + 
 +  * run 
 + 
 +<code> 
 +#!/bin/bash 
 + 
 +rm -f out err job3.out 
 + 
 +#BSUB -q normal 
 +#BSUB -J test 
 +#BSUB -n 1 
 +#BSUB -e err 
 +#BSUB -o out 
 + 
 +export MYSANSCRATCH=/home/sanscratch/$LSB_JOBID 
 +export MYLOCALSCRATCH=/localscratch/$LSB_JOBID 
 + 
 +cd $MYLOCALSCRATCH 
 +pwd 
 +cp ~/job.sh . 
 +time job.sh > job.out 
 + 
 +cd $MYSANSCRATCH 
 +pwd 
 +cp $LOCALSCRATCH/job.out job2.out 
 + 
 +cd 
 +pwd 
 +cp $MYSANSCRATCH/job2.out job3.out 
 +</code> 
 + 
 +  * job.sh 
 +  *  
 +<code> 
 +#!/bin/bash 
 + 
 +sleep 10 
 +echo Done sleeping. 
 + 
 +for i in `seq 1 100` 
 +do 
 +      date 
 +done 
 + 
 +</code> 
 + 
 +  * 'bsub < run' (submits) 
 +  * 'bjobs' (check dispatch) 
 + 
 + 
 +===== Step 6 ===== 
 + 
 +More fun. Parallel jobs can be submitted over ethernet interconnects but will not achieve the performance of Infiniband interconnects ofcourse.  OpenMPI is a nice MPI flavor becuase software compiled with it automatically detects if the host has an HCA card or not and will allocate the appropriate libraries. So in order to compile some OpenMPI examples we need the following: 
 + 
 +  * yum install libibverbs 
 +  * yum install gcc-g++ 
 +  * export PATH=/opt/openmpi/gnu/bin:$PATH 
 +  * export LD_LIBRARY_PATH=/opt/openmpi/gnu/lib:$LD_LIBRARY_PATH 
 +  * cd /opt/openmpi/gnu/examples; make 
 + 
 + 
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/88.txt · Last modified: 2010/08/17 19:56 by hmeij