This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:88 [2010/08/09 20:31] hmeij |
cluster:88 [2010/08/10 20:25] hmeij |
||
---|---|---|---|
Line 94: | Line 94: | ||
* 10 - kits: select Add, insert kit cd, wait, cycle through disks by kit, then No More Kits, then Finish (node reboots). | * 10 - kits: select Add, insert kit cd, wait, cycle through disks by kit, then No More Kits, then Finish (node reboots). | ||
- | Upon reboot check some command output: hostname, route, ifconfig, bhosts, bqueues | + | Upon reboot |
===== Step 3 ===== | ===== Step 3 ===== | ||
Line 131: | Line 131: | ||
* / | * / | ||
* 'mysql -u root' should work | * 'mysql -u root' should work | ||
- | * and/or python is missing a drive | + | * and/or python is missing a driver |
* yum install MySQL-python | * yum install MySQL-python | ||
* when addhost starts, select the *_BSS nodegroup created, and eth0 interface | * when addhost starts, select the *_BSS nodegroup created, and eth0 interface | ||
Line 137: | Line 137: | ||
* if you know blade will boot of network let it go, else F2, enter BIOS, set Boot menu to network first | * if you know blade will boot of network let it go, else F2, enter BIOS, set Boot menu to network first | ||
* once blade sends its eth0 IP over and receives kickstart file, move on to next blade | * once blade sends its eth0 IP over and receives kickstart file, move on to next blade | ||
- | * do 2-3 baldes | + | * do 2-3 blades |
- | * once the first blade is rebooted, enter BIOS, set boot menu to hard disk first | + | * once the first blade is rebooted, enter BIOS, set boot menu to hard disk |
- | * there' | + | * there' |
* once the last blade has fully booted of the hard disk quit addhost on installer node | * once the last blade has fully booted of the hard disk quit addhost on installer node | ||
* addhost will now push new files to all the members of the cluster using cfmsync | * addhost will now push new files to all the members of the cluster using cfmsync | ||
- | issue ' | + | Issue ' |
+ | |||
+ | ===== Step 4 ===== | ||
+ | |||
+ | Ugly step. If you look at /etc/hosts you'll see what we mean. All blade host names should be unique, so we're going to fix some files. | ||
+ | |||
+ | * first installer ' | ||
+ | * copy /etc/hosts to / | ||
+ | * put installer lines together, put 192.168 and 10.10 lines together for an easier read | ||
+ | * for 10.10 remove all short host names like ' | ||
+ | * for 192.168 add ' | ||
+ | * leave all the other host names intact (*.kusu101, *-eth0, etc) | ||
+ | * copy hosts-good across hosts file | ||
+ | |||
+ | * next do the same for hosts.pdsh but only use short host names, one name per node | ||
+ | |||
+ | * next do the same for / | ||
+ | |||
+ | * next edit / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | |||
+ | * in / | ||
+ | * link in all the *-good files at appropriate locations | ||
+ | * make the rc.d directory at appropriate level and link in rc.local | ||
+ | * run ' | ||
+ | * on installer node run '/ | ||
+ | * 'pdsh uptime' | ||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | Now reboot the entire cluster and observe changes to be permanent. Sidebar: for Pace, you can now on the installer node assign eth1 a pace.edu IP, and have the necessary changes made to the ProCurve switch, so your users can log into the installer/ | ||
+ | |||
+ | |||
+ | ===== Step 5 ===== | ||
+ | |||
+ | Fun step. | ||
+ | |||
+ | * make a backup copy of / | ||
+ | * edit file, delete everything but queue ' | ||
+ | * (if you rename queue normal you also need to edit lsb.params) | ||
+ | * remove most queue definitions and set the following | ||
+ | * QJOBLIMIT = 4 (assuming you have 2 nodes in cluster, 6 if you have 3, iow #nodes * 2) | ||
+ | * UJOBLIMIT = 1000 (user like to write scripts and submit jobs, this protects from runaway scripts) | ||
+ | * INTERACTIVE = no (only batch is allowed) | ||
+ | * EXCLUSIVE = Y (allow the bsub -x flag) | ||
+ | * PRE_EXEC = / | ||
+ | * POST_EXEC = / | ||
+ | * make the directories /home/apps and / | ||
+ | * create the pre/post exec files (post does an rm -rf against the created directories) | ||
+ | < | ||
+ | # | ||
+ | if [" | ||
+ | mkdir -p / | ||
+ | sleep 5; exit 0 | ||
+ | else | ||
+ | echo " | ||
+ | exit 111 | ||
+ | fi | ||
+ | </ | ||
+ | |||
+ | * ' | ||
+ | * ' | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |