This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:88 [2010/08/09 20:03] hmeij |
cluster:88 [2010/08/10 20:14] hmeij |
||
---|---|---|---|
Line 94: | Line 94: | ||
* 10 - kits: select Add, insert kit cd, wait, cycle through disks by kit, then No More Kits, then Finish (node reboots). | * 10 - kits: select Add, insert kit cd, wait, cycle through disks by kit, then No More Kits, then Finish (node reboots). | ||
- | Upon reboot check some command output: hostname, route, ifconfig, bhosts, bqueues | + | Upon reboot |
===== Step 3 ===== | ===== Step 3 ===== | ||
Line 124: | Line 124: | ||
* cfmsync: update, no | * cfmsync: update, no | ||
* ' | * ' | ||
+ | |||
+ | * now we're ready to add compute nodes. type ' | ||
+ | * if you receive an error about MySQLDB not found in 10-cacti.py one of two situations we have encountered | ||
+ | * mysql was not installed, add an initialize database | ||
+ | * grep mysql / | ||
+ | * / | ||
+ | * 'mysql -u root' should work | ||
+ | * and/or python is missing a driver | ||
+ | * yum install MySQL-python | ||
+ | * when addhost starts, select the *_BSS nodegroup created, and eth0 interface | ||
+ | * make sure blades have purple cable in bottom interface, turn blade on | ||
+ | * if you know blade will boot of network let it go, else F2, enter BIOS, set Boot menu to network first | ||
+ | * once blade sends its eth0 IP over and receives kickstart file, move on to next blade | ||
+ | * do 2-3 blades this way | ||
+ | * once the first blade is rebooted, enter BIOS, set boot menu to hard disk | ||
+ | * there' | ||
+ | * once the last blade has fully booted of the hard disk quit addhost on installer node | ||
+ | * addhost will now push new files to all the members of the cluster using cfmsync | ||
+ | |||
+ | Issue ' | ||
+ | |||
+ | ===== Step 4 ===== | ||
+ | |||
+ | Ugly step. If you look at /etc/hosts you'll see what we mean. All blade host names should be unique, so we're going to fix some files. | ||
+ | |||
+ | * first installer ' | ||
+ | * copy /etc/hosts to / | ||
+ | * put installer lines together, put 192.168 and 10.10 lines together for an easier read | ||
+ | * for 10.10 remove all short host names like ' | ||
+ | * for 192.168 add ' | ||
+ | * leave all the other host names intact (*.kusu101, *-eth0, etc) | ||
+ | * copy hosts-good across hosts file | ||
+ | |||
+ | * next do the same for hosts.pdsh but only use short host names, one name per node | ||
+ | |||
+ | * next do the same for / | ||
+ | |||
+ | * next edit / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | |||
+ | * in / | ||
+ | * link in all the *-good files at appropriate locations | ||
+ | * make the rc.d directory at appropriate level and link in rc.local | ||
+ | * run ' | ||
+ | * on installer node run '/ | ||
+ | * 'pdsh uptime' | ||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | Now reboot the entire cluster and observe changes to be permanent. Sidebar: for Pace, you can now on the installer node assign eth1 a pace.edu IP, and have the necessary changes made to the ProCurve switch, so your users can log into the installer/ | ||
+ | |||
+ | |||
+ | ===== Step 5 ===== | ||
+ | |||
+ | * make a backup copy of / | ||
+ | * edit file, delete everything but queue ' | ||
+ | * (if you rename queue normal | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |