User Tools

Site Tools


cluster:88

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:88 [2010/08/10 18:42]
hmeij
cluster:88 [2010/08/11 14:25]
hmeij
Line 173: Line 173:
   * 'pdsh uptime' should now list the hosts with short name   * 'pdsh uptime' should now list the hosts with short name
   * 'bhosts' should in a little while now show the hosts as available   * 'bhosts' should in a little while now show the hosts as available
-  * 'lsload' should do same+  * 'lsload' should do the same 
 + 
 +Now reboot the entire cluster and observe changes to be permanent. Sidebar: for Pace, you can now on the installer node assign eth1 a pace.edu IP, and have the necessary changes made to the ProCurve switch, so your users can log into the installer/head node.  You still only have 50 gb or so of home directory space but users can play around.   
 + 
 + 
 +===== Step 5 ===== 
 + 
 +Fun step.
  
   * make a backup copy of /etc/lava/conf/lsbatch/lava/configdir/lsb.queues   * make a backup copy of /etc/lava/conf/lsbatch/lava/configdir/lsb.queues
   * edit file, delete everything but queue 'normal' definition   * edit file, delete everything but queue 'normal' definition
-  * (if you rename queue normal+  * (if you rename queue normal you also need to edit lsb.params and define default queue) 
 +  * remove most queue definitions and set the following 
 +    * QJOB_LIMIT = 4 (assuming you have 2 nodes in cluster, 6 if you have 3, iow #nodes * #cores) 
 +    * UJOB_LIMIT = 1000 (user like to write scripts and submit jobs, this protects from runaway scripts) 
 +    * INTERACTIVE = no (only batch is allowed) 
 +    * EXCLUSIVE = Y (allow the bsub -x flag) 
 +    * PRE_EXEC = /home/apps/lava/pre_exec  (these two will create/remove the scratch dirs) 
 +    * POST_EXEC = /home/apps/lava/post_exec 
 +  * make the directories /home/apps (for compiled software) 
 +  * make the directory /home/lava and /home/sanscratch 
 +  * be sure /localscratch and /home/sanscratch have permissions like /tmp on all blades 
 +  * create the pre/post exec files (post does an rm -rf against the created directories) 
 +  * for example: 
 +<code> 
 +#!/bin/bash 
 +if ["X$LSB_JOBID" != "X" ]; then 
 +    mkdir -p /home/sanscratch/$LSB_JOBID /localscratch/$LSB_JOBID 
 +    sleep 5; exit 0 
 +else 
 +    echo "LSB_JOBID NOT SET!" 
 +    exit 111 
 +fi 
 +</code> 
 + 
 +  * 'badmin reconfig' 
 +  * 'bqueues' should now show new configuration 
 + 
 +Now we're ready to submit a serial jobs.  As a non-privilege user create two files: 
 + 
 +  * run 
 + 
 +<code> 
 +#!/bin/bash 
 + 
 +rm -f out err job3.out 
 + 
 +#BSUB -q normal 
 +#BSUB -J test 
 +#BSUB -n 1 
 +#BSUB -e err 
 +#BSUB -o out 
 + 
 +export MYSANSCRATCH=/home/sanscratch/$LSB_JOBID 
 +export MYLOCALSCRATCH=/localscratch/$LSB_JOBID 
 + 
 +cd $MYLOCALSCRATCH 
 +pwd 
 +cp ~/job.sh . 
 +time job.sh > job.out 
 + 
 +cd $MYSANSCRATCH 
 +pwd 
 +cp $LOCALSCRATCH/job.out job2.out 
 + 
 +cd 
 +pwd 
 +cp $MYSANSCRATCH/job2.out job3.out 
 +</code> 
 + 
 +  * job.sh 
 +  *  
 +<code> 
 +#!/bin/bash 
 + 
 +sleep 10 
 +echo Done sleeping. 
 + 
 +for i in `seq 1 100` 
 +do 
 +      date 
 +done 
 + 
 +</code> 
 + 
 +  * 'bsub < run' (submits) 
 +  * 'bjobs' (check dispatch) 
 + 
 + 
 +===== Step 6 ===== 
 + 
 +More fun. 
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/88.txt · Last modified: 2010/08/17 19:56 by hmeij