This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:88 [2010/08/10 18:04] hmeij |
cluster:88 [2010/08/10 21:02] hmeij |
||
---|---|---|---|
Line 146: | Line 146: | ||
===== Step 4 ===== | ===== Step 4 ===== | ||
+ | |||
+ | Ugly step. If you look at /etc/hosts you'll see what we mean. All blade host names should be unique, so we're going to fix some files. | ||
+ | |||
+ | * first installer ' | ||
+ | * copy /etc/hosts to / | ||
+ | * put installer lines together, put 192.168 and 10.10 lines together for an easier read | ||
+ | * for 10.10 remove all short host names like ' | ||
+ | * for 192.168 add ' | ||
+ | * leave all the other host names intact (*.kusu101, *-eth0, etc) | ||
+ | * copy hosts-good across hosts file | ||
+ | |||
+ | * next do the same for hosts.pdsh but only use short host names, one name per node | ||
+ | |||
+ | * next do the same for / | ||
+ | |||
+ | * next edit / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | |||
+ | * in / | ||
+ | * link in all the *-good files at appropriate locations | ||
+ | * make the rc.d directory at appropriate level and link in rc.local | ||
+ | * run ' | ||
+ | * on installer node run '/ | ||
+ | * 'pdsh uptime' | ||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | Now reboot the entire cluster and observe changes to be permanent. Sidebar: for Pace, you can now on the installer node assign eth1 a pace.edu IP, and have the necessary changes made to the ProCurve switch, so your users can log into the installer/ | ||
+ | |||
+ | |||
+ | ===== Step 5 ===== | ||
+ | |||
+ | Fun step. | ||
+ | |||
+ | * make a backup copy of / | ||
+ | * edit file, delete everything but queue ' | ||
+ | * (if you rename queue normal you also need to edit lsb.params) | ||
+ | * remove most queue definitions and set the following | ||
+ | * QJOBLIMIT = 4 (assuming you have 2 nodes in cluster, 6 if you have 3, iow #nodes * 2) | ||
+ | * UJOBLIMIT = 1000 (user like to write scripts and submit jobs, this protects from runaway scripts) | ||
+ | * INTERACTIVE = no (only batch is allowed) | ||
+ | * EXCLUSIVE = Y (allow the bsub -x flag) | ||
+ | * PRE_EXEC = / | ||
+ | * POST_EXEC = / | ||
+ | * make the directories /home/apps (for compile software) | ||
+ | * make the directory /home/lava | ||
+ | * be sure / | ||
+ | * create the pre/post exec files (post does an rm -rf against the created directories) | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | if [" | ||
+ | mkdir -p / | ||
+ | sleep 5; exit 0 | ||
+ | else | ||
+ | echo " | ||
+ | exit 111 | ||
+ | fi | ||
+ | </ | ||
+ | |||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | Now we're ready to submit jobs. As non-priviledged user create two files: | ||
+ | * run | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | rm -f out err job*.out | ||
+ | #BSUB -q normal | ||
+ | #BSUB -J test | ||
+ | #BSUB -n 1 | ||
+ | #BSUB -e err | ||
+ | #BSUB -o out | ||
+ | |||
+ | export MYSANSCRATCH=/ | ||
+ | export MYLOCALSCRATCH=/ | ||
+ | |||
+ | cd $MYLOCALSCRATCH | ||
+ | pwd | ||
+ | cp ~/job.sh . | ||
+ | time job.sh > job.out | ||
+ | |||
+ | cd $MYSANSCRATCH | ||
+ | pwd | ||
+ | cp $LOCALSCRATCH/ | ||
+ | |||
+ | cd | ||
+ | pwd | ||
+ | cp $MYSANSCRATCH/ | ||
+ | </ | ||
+ | |||
+ | * job.sh | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | sleep 10 | ||
+ | echo Done sleeping. | ||
+ | |||
+ | for i in `seq 1 100` | ||
+ | do | ||
+ | date | ||
+ | done | ||
+ | |||
+ | </ | ||
+ | |||
+ | * 'bsub < run' (submits) | ||
+ | * ' | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |