This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
cluster:88 [2010/07/30 17:42] hmeij created |
cluster:88 [2010/08/09 20:18] hmeij |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||
+ | ====== Blue Sky Studios ====== | ||
+ | |||
+ | ===== Hardware ===== | ||
+ | |||
+ | We have 4 racks of which 3 are powered up. All on utility power including head/login node. Racks are surprisingly cool compared | ||
+ | |||
+ | If you want to use the ProCurve switches you need to power up the top two shelves within each rack or use an alternate source of power. | ||
+ | |||
+ | We wanted to separate the data traffic (NFS) from the software management and MPI traffic so will be leveraging both ethernet ports on each blade. | ||
+ | |||
+ | It's important to note that the management software (Project Kusu, see below) assumes eth0 is provision (192.168) but eth1 is on your domain (like wesleyan.edu or pace.edu). | ||
+ | |||
+ | We bought 52 three feet CAT 6 ethernet cables for each rack. The original purple cables, connecting blade to rack in the top two shelves within a rack, connect to the bottom ethernet blade port (eth0). For the bottom two racks, the purple cables connect to top ethernet blade port (eth1). | ||
+ | |||
+ | Our storage is provided by one of our NetApp filers (5TB volume) via NFS. The filer is known as filer3a or filer13a and sits on our internal private network with IPs in the 10.10.0.y/ | ||
+ | |||
+ | Our head node has a similar setup (provision and data ports). | ||
+ | |||
+ | ===== Software ===== | ||
+ | |||
+ | For our operating system we choose CentOS 5.3 and burned the ISO images to cdrom. | ||
+ | |||
+ | Once you have all these burned to cdrom, you are ready to step through 12 installation screens which are fairly straight forward. The screens are described at http:// | ||
+ | |||
+ | After reboot, Kusu will have create a /depot directory with the CentOS inside it. It can be manipulated with repoman (for example, take a snapshot before you change anything). | ||
+ | |||
+ | The next step is optional but I did it because I wanted my node IPs to be in a certain range and increment downwards, for example start at 192.168.1.254/ | ||
+ | |||
+ | Once you have the templates in place, on the installer node start the command addhost (which will take over the console). | ||
+ | |||
+ | You also have the option of configuring diskless compute nodes within Kusu, or you can mix and match. | ||
+ | |||
+ | Some final configurations steps. | ||
+ | |||
+ | You can change configurations on the compute nodes in two ways. Command pdsh will execute the same command in parallel across all the hosts listed in / | ||
+ | |||
+ | Of note is that we compile and install requested software in /home/apps/ so that it is immediately available cluster wide. For parallel programs we compile with OpenMPI so these programs can run on both infiniband and ethernet switches. | ||
+ | |||
+ | There are two scratch areas. | ||
+ | |||
+ | Accounts are created locally on the cluster. | ||
+ | |||
+ | There are very few policies on our clusters. Use disk space as needed and archive data elsewhere. | ||
+ | |||
+ | |||
+ | ===== Step 1 ===== | ||
+ | |||
+ | Download, MD check sum, and burn following ISOs to disc. | ||
+ | |||
+ | * Kusu Gong Gong Release version 1.1 on the x86_64 architecture. | ||
+ | * http:// | ||
+ | * It states other kits are included, but did not find them. | ||
+ | |||
+ | * Other kits: Lava, Ganglia, NTop, Cacti | ||
+ | * http:// | ||
+ | |||
+ | * CentOS kiy: http:// | ||
+ | * http:// | ||
+ | * we have 5.3, for 5.5 you'll need the 1of8 series isos | ||
+ | |||
+ | I recommend check summing the files. Had trouble with these files downloading cleanly. | ||
+ | |||
+ | ===== Step 2 ===== | ||
+ | |||
+ | * Select an installer node, insert Kusu Installer into CD/DVD, and connect device via USB ports. | ||
+ | * Installer node, and 2-3 compute nodes, must have the purple cable connecting eth0 (bottom port) to rack ProCurve switch (top one). If you wish, you can cable top port (eth1) into bottom switch for testing, but this is not necessary. | ||
+ | * Boot installer node, hit F2 to Enter BIOS, traverse to menu tab Boot and make sure both CDROM and Removable Device are listed before any other options like hard disk and network cards, hit F10, save changes and exit/ | ||
+ | * Next you should see the Project Kusu splash page with the orange lego turtle; when prompted type ' | ||
+ | * Navigation around these screens is Tab/Enter and arrow keys. | ||
+ | * Next come the informational screens, in order | ||
+ | * 1 - language: English | ||
+ | * 2 - keyboard: us | ||
+ | * 3 - network, configure each interface, edit and configure two private networks (for Pace we'll reset eth1 on installer node later on for public access), this is so that the cluster is not accessible from outside and we could separate provision from private (NFS data/MPI) traffic. Edit: | ||
+ | * eth0: 192.168.101.254/ | ||
+ | * name: kusu101prov, | ||
+ | * eth1: 10.10.101.254/ | ||
+ | * name: kusupriv, type: other | ||
+ | * 4 - gateway & dns: gateway 192.168.101.0 (is not used but required field), dns server 192.168.101.254 (installer node) | ||
+ | * 5 - host: FQDN kusu101, PCD kusu101 (basically we will not provide internet accessible names) | ||
+ | * 6 - time: American/ | ||
+ | * 7 - root password: password (keep simple for now, change later) | ||
+ | * 8 - disk partitions: select 'Use Default' | ||
+ | * edit /home downsize to 1024 (Pace may want to leave this much larger and create a 1 GB / | ||
+ | * add a logical volume | ||
+ | * mount point / | ||
+ | * label LOCALSCRATCH | ||
+ | * size: leave blank, see below | ||
+ | * type ext3, on hda, check "fill remaining space on disk" (!only one partition can have this setting!) | ||
+ | * 9 - confirm: accept (at this point the disk gets reformatted) | ||
+ | * 10 - kits: select Add, insert kit cd, wait, cycle through disks by kit, then No More Kits, then Finish (node reboots). | ||
+ | |||
+ | Upon reboot check some command output: hostname, route, ifconfig, bhosts, bqueues | ||
- | ==== Blue Sky Studios | + | ===== Step 3 ===== |
+ | * first create network interfaces for nodes, different from installer network interfaces | ||
+ | * type ' | ||
+ | * ' | ||
+ | * network: 192.168.0.0 | ||
+ | * subnet: 255.255.0.0 | ||
+ | * gateway: 192.168.101.0 | ||
+ | * device: eth0 | ||
+ | * starting IP: 192.168.101.250 | ||
+ | * suffix: -eth0 | ||
+ | * increment: -1 (that' | ||
+ | * options: | ||
+ | * description: | ||
+ | * ' | ||
+ | * ' | ||
+ | * next we are going to create our nodegroup template for the compute nodes, type ' | ||
+ | * use ' | ||
+ | * general: change name Copy 1 to _BSS with format node#NN (we don't care about rack and like short names) | ||
+ | * repository: there is only one, select it | ||
+ | * boot time: | ||
+ | * components: (check that non-server/ | ||
+ | * networks: here select only the interfaces you create: nodeprov eth0 and nodepriv eth1 | ||
+ | * optional: do select vim* and emacs* packages (annoying) | ||
+ | * partition: resize /data to 1024 and add partition / | ||
+ | * cfmsync: update, no | ||
+ | * ' | ||
+ | * now we're ready to add compute nodes. type ' | ||
+ | * if you receive an error about MySQLDB not found in 10-cacti.py one of two situations we have encountered | ||
+ | * mysql was not installed, add an initialize database | ||
+ | * grep mysql / | ||
+ | * / | ||
+ | * 'mysql -u root' should work | ||
+ | * and/or python is missing a drive | ||
+ | * yum install MySQL-python | ||
+ | * when addhost starts, select the *_BSS nodegroup created, and eth0 interface | ||
+ | * make sure blades have purple cable in bottom interface, turn blade on | ||
+ | * if you know blade will boot of network let it go, else F2, enter BIOS, set Boot menu to network first | ||
+ | * once blade sends its eth0 IP over and receives kickstart file, move on to next blade | ||
+ | * do 2-3 baldes this way | ||
+ | * once the first blade is rebooted, enter BIOS, set boot menu to hard disk first | ||
+ | * there' | ||
+ | * once the last blade has fully booted of the hard disk quit addhost on installer node | ||
+ | * addhost will now push new files to all the members of the cluster using cfmsync | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |