This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:88 [2010/08/04 20:16] hmeij |
cluster:88 [2010/08/11 18:11] hmeij |
||
---|---|---|---|
Line 67: | Line 67: | ||
===== Step 2 ===== | ===== Step 2 ===== | ||
- | * Select an installer node, insert Kusu Installer into CD/DVD, and connect via USB ports. | + | * Select an installer node, insert Kusu Installer into CD/DVD, and connect |
- | * Installer node (and the compute nodes we'll create) need eth0 (bottom port) connected | + | * Installer node, and 2-3 compute nodes, must have the purple cable connecting |
- | * Boot, hit F2 to Enter BIOS, traverse to menu tab Boot and make sure both CDROM and Removable Device are listed before any other options like hard disk and network cards, hit F10 save changes and exit. | + | * Boot installer node, hit F2 to Enter BIOS, traverse to menu tab Boot and make sure both CDROM and Removable Device are listed before any other options like hard disk and network cards, hit F10, save changes and exit/reboot. |
- | * Next you should see the Project Kusu splash page with the orange turtle; when prompted type ' | + | * Next you should see the Project Kusu splash page with the orange |
- | * Navigation | + | * Navigation |
* Next come the informational screens, in order | * Next come the informational screens, in order | ||
- | | + | * 1 - language: English |
- | - keyboard: us | + | * 2 - keyboard: us |
- | - network, configure each interface, edit and configure two private networks (for Pace we'll reset eth1 on installer node later on for public access), this is so that the cluster is not accessible from outside and we could separate provision from private (NFS data/MPI) traffic. | + | * 3 - network, configure each interface, edit and configure two private networks (for Pace we'll reset eth1 on installer node later on for public access), this is so that the cluster is not accessible from outside and we could separate provision from private (NFS data/MPI) traffic. |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | - gateway & dns: gateway 192.168.101.0 (is not used but required field), dns server 192.168.101.254 (installer node) | + | * 4 - gateway & dns: gateway 192.168.101.0 (is not used but required field), dns server 192.168.101.254 (installer node) |
- | - host: FQDN kusu101, PCD kusu101 (basically we will not provide internet accessible names) | + | * 5 - host: FQDN kusu101, PCD kusu101 (basically we will not provide internet accessible names) |
- | - time: American/ | + | * 6 - time: American/ |
- | - root password: password (keep simple for now, change later) | + | * 7 - root password: password (keep simple for now, change later) |
- | - partition: select 'Use Default' | + | * 8 - disk partitions: select 'Use Default' |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | - confirm: accept (at this point the disk gets reformatted) | + | * 9 - confirm: accept (at this point the disk gets reformatted) |
- | - kits: select Add, cycle through disks by kit, then Finish (node reboots). | + | * 10 - kits: select Add, insert kit cd, wait, cycle through disks by kit, then No More Kits, then Finish (node reboots). |
+ | |||
+ | Upon reboot (enter BIOS and reset boot to hard disk first) check some command output: hostname, route, ifconfig, bhosts, bqueues | ||
+ | |||
+ | ===== Step 3 ===== | ||
+ | |||
+ | * first create network interfaces for nodes, different from installer network interfaces | ||
+ | * type ' | ||
+ | * ' | ||
+ | * network: 192.168.0.0 | ||
+ | * subnet: 255.255.0.0 | ||
+ | * gateway: 192.168.101.0 | ||
+ | * device: eth0 | ||
+ | * starting IP: 192.168.101.250 | ||
+ | * suffix: -eth0 | ||
+ | * increment: -1 (that' | ||
+ | * options: | ||
+ | * description: | ||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | * next we are going to create our nodegroup template for the compute nodes, type ' | ||
+ | * use ' | ||
+ | * general: change name Copy 1 to _BSS with format node#NN (we don't care about rack and like short names) | ||
+ | * repository: there is only one, select it | ||
+ | * boot time: | ||
+ | * components: (check that non-server/ | ||
+ | * networks: here select only the interfaces you create: nodeprov eth0 and nodepriv eth1 | ||
+ | * optional: do select vim* and emacs* packages (annoying) | ||
+ | * partition: resize /data to 1024 and add partition / | ||
+ | * cfmsync: update, no | ||
+ | * ' | ||
+ | |||
+ | * now we're ready to add compute nodes. type ' | ||
+ | * if you receive an error about MySQLDB not found in 10-cacti.py one of two situations we have encountered | ||
+ | * mysql was not installed, add an initialize database | ||
+ | * grep mysql / | ||
+ | * / | ||
+ | * 'mysql -u root' should work | ||
+ | * and/or python is missing a driver | ||
+ | * yum install MySQL-python | ||
+ | * when addhost starts, select the *_BSS nodegroup created, and eth0 interface | ||
+ | * make sure blades have purple cable in bottom interface, turn blade on | ||
+ | * if you know blade will boot of network let it go, else F2, enter BIOS, set Boot menu to network first | ||
+ | * once blade sends its eth0 IP over and receives kickstart file, move on to next blade | ||
+ | * do 2-3 blades this way | ||
+ | * once the first blade is rebooted, enter BIOS, set boot menu to hard disk | ||
+ | * there' | ||
+ | * once the last blade has fully booted of the hard disk quit addhost on installer node | ||
+ | * addhost will now push new files to all the members of the cluster using cfmsync | ||
+ | |||
+ | Issue ' | ||
+ | |||
+ | ===== Step 4 ===== | ||
+ | |||
+ | Ugly step. If you look at /etc/hosts you'll see what we mean. All blade host names should be unique, so we're going to fix some files. | ||
+ | |||
+ | * first installer ' | ||
+ | * copy /etc/hosts to / | ||
+ | * put installer lines together, put 192.168 and 10.10 lines together for an easier read | ||
+ | * for 10.10 remove all short host names like ' | ||
+ | * for 192.168 add ' | ||
+ | * leave all the other host names intact (*.kusu101, *-eth0, etc) | ||
+ | * copy hosts-good across hosts file | ||
+ | |||
+ | * next do the same for hosts.pdsh but only use short host names, one name per node | ||
+ | |||
+ | * next do the same for / | ||
+ | |||
+ | * next edit / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | * cp / | ||
+ | |||
+ | * in / | ||
+ | * link in all the *-good files at appropriate locations | ||
+ | * make the rc.d directory at appropriate level and link in rc.local | ||
+ | * run ' | ||
+ | * on installer node run '/ | ||
+ | * 'pdsh uptime' | ||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | Now reboot the entire cluster and observe changes to be permanent. Sidebar: for Pace, you can now on the installer node assign eth1 a pace.edu IP, and have the necessary changes made to the ProCurve switch, so your users can log into the installer/ | ||
+ | |||
+ | |||
+ | ===== Step 5 ===== | ||
+ | |||
+ | Fun step. | ||
+ | |||
+ | * make a backup copy of / | ||
+ | * edit file, delete everything but queue ' | ||
+ | * (if you rename queue normal you also need to edit lsb.params and define default queue) | ||
+ | * remove most queue definitions and set the following | ||
+ | * QJOB_LIMIT = 4 (assuming you have 2 nodes in cluster, 6 if you have 3, iow #nodes * #cores) | ||
+ | * UJOB_LIMIT = 1000 (user like to write scripts and submit jobs, this protects from runaway scripts) | ||
+ | * INTERACTIVE = no (only batch is allowed) | ||
+ | * EXCLUSIVE = Y (allow the bsub -x flag) | ||
+ | * PRE_EXEC = / | ||
+ | * POST_EXEC = / | ||
+ | * make the directories /home/apps (for compiled software) | ||
+ | * make the directory /home/lava and / | ||
+ | * be sure / | ||
+ | * create the pre/post exec files (post does an rm -rf against the created directories) | ||
+ | * for example: | ||
+ | < | ||
+ | # | ||
+ | if [" | ||
+ | mkdir -p / | ||
+ | sleep 5; exit 0 | ||
+ | else | ||
+ | echo " | ||
+ | exit 111 | ||
+ | fi | ||
+ | </ | ||
+ | |||
+ | * ' | ||
+ | * ' | ||
+ | |||
+ | Now we're ready to submit a serial jobs. As a non-privilege user create two files: | ||
+ | |||
+ | * run | ||
+ | |||
+ | < | ||
+ | # | ||
+ | |||
+ | rm -f out err job3.out | ||
+ | |||
+ | #BSUB -q normal | ||
+ | #BSUB -J test | ||
+ | #BSUB -n 1 | ||
+ | #BSUB -e err | ||
+ | #BSUB -o out | ||
+ | |||
+ | export MYSANSCRATCH=/ | ||
+ | export MYLOCALSCRATCH=/ | ||
+ | |||
+ | cd $MYLOCALSCRATCH | ||
+ | pwd | ||
+ | cp ~/job.sh . | ||
+ | time job.sh > job.out | ||
+ | |||
+ | cd $MYSANSCRATCH | ||
+ | pwd | ||
+ | cp $LOCALSCRATCH/ | ||
+ | |||
+ | cd | ||
+ | pwd | ||
+ | cp $MYSANSCRATCH/ | ||
+ | </ | ||
+ | |||
+ | * job.sh | ||
+ | * | ||
+ | < | ||
+ | # | ||
+ | |||
+ | sleep 10 | ||
+ | echo Done sleeping. | ||
+ | |||
+ | for i in `seq 1 100` | ||
+ | do | ||
+ | date | ||
+ | done | ||
+ | |||
+ | </ | ||
+ | |||
+ | * 'bsub < run' (submits) | ||
+ | * ' | ||
+ | |||
+ | |||
+ | ===== Step 6 ===== | ||
+ | |||
+ | More fun. Parallel jobs can be submitted over ethernet interconnects but will not achieve the performance of Infiniband interconnects ofcourse. | ||
+ | |||
+ | * yum install libibverbs; pdsh yum install libibverbs -q -y | ||
+ | * yum install gcc-c++ | ||
+ | |||
+ | On our Dell cluster we have static pre-compiled flavors of MPI and OFED. A tarball of 200 MB can be found here [[hhttp:// | ||
+ | |||
+ | * download tarball, stage in / | ||
+ | * cd /opt; tar zxvf / | ||
+ | * examples in / | ||
+ | * export PATH=/ | ||
+ | * export LD_LIBRARY_PATH=/ | ||
+ | * cd / | ||
+ | * ./ring.c; ./hello.c (to test, it'll complain about no HCA card) | ||
+ | |||
+ | Ok, so now we need write a script to submit a parallel job. A parallel job is submitted with command ' | ||
+ | |||
+ | * irun | ||
+ | |||
+ | < | ||
+ | # | ||
+ | |||
+ | rm -f err out | ||
+ | |||
+ | #BSUB -e err | ||
+ | #BSUB -o out | ||
+ | #BSUB -n 4 | ||
+ | #BSUB -q normal | ||
+ | #BSUB -J ptest | ||
+ | |||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | |||
+ | echo "make sure we have the right mpirun" | ||
+ | which mpirun | ||
+ | |||
+ | / | ||
+ | |||
+ | / | ||
+ | |||
+ | </ | ||
+ | |||
+ | * 'bsub < irun' (submits) | ||
+ | * ' | ||
+ | |||
+ | ===== Step 7 ===== | ||
+ | |||
+ | Tools. As you add nodes, monitoring tools are added to Ganglia and Cacti. | ||
+ | |||
+ | But first we must fix firefox. | ||
+ | |||
+ | * ' | ||
+ | * ' | ||
+ | * http:// | ||
+ | * http:// | ||
+ | * http:// | ||
+ | |||
+ | * http:// | ||
+ | * http:// | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |