Differences

This shows you the differences between two versions of the page.

--- cluster:145 [2015/12/11 19:57]
127.0.0.1 external edit
+++ cluster:145 [2017/04/05 15:22] (current)
hmeij07
@@ Line 2: / Line 2: @@
 **[[cluster:0|Back]]**
-===== Warewulf Golden Image =====
+===== IPoIB =====
-Also read these pages and this page will make more sense: [[cluster:139|Warewulf Stateless]], [[cluster:143|Warewulf Statefull]].
+Redoing our RHEL5.5 HP Proliant blade servers with CentOS 6.7 using [[cluster:144|Warewulf Golden Image]] provisioning.
-For some time now I have been looking for a provisioning tool. I've tried along the way ...
+Not quite there yet, but I'll document here how Infiniband was installed. These compute nodes are connect to a Voltaire interconnect, and aging quite a bit.
-  * Project Kusu, now defunct, but a great, simple template driven system. No fancy gui.
+First install a vanilla basic server and load packages you need. Then add the packages for Infiniband.
-  * HP's CMU, also a great tool, golden image approach. The nice feature of CMU is that master node can delegate hundreds of node to be image by a designated compute node relieving the master node.
-  * Bright Computing, a very complex tool that takes over every config file imaginable. Simple tasks become very burdensome, never achieved traction with this tool.
-  * xCAT, the behemoth of open source provisioning tools and more. It does it all, which means a huge learning curve.
-  * [[http://warewulf.lbl.gov/trac]] Warewulf. I settled on this for the reasons that it is simple and written in a language that's readable (Perl).
-The requirements of the provisioning tool were two fold:
-  * My HPCC environment is flooded with small jobs that run for weeks to months (no wall time) but have small memory requirements (< 1GB). Thus I want to design stateless compute nodes, or virtual compute nodes, and frequently tailor the config & setup to the scientific needs (mostly non graphical, just CPU compute bound jobs, very little IO).
-  * The HPCC also encounters very large jobs (for us that is 16-32 cores with memory requirements in the 256 GB range) utilizing X11, OpenGL, Nvidia and other large complex analyses software. In this case one compute node is build up to satisfaction, then we grab a "golden image" and deploy.
-So I settled on Warewulf which does these two approaches and sports an active forum for questions.
-Not finding much on the "golden image" use of Warewulf, I've written up experiences. Hope it helps somebody. In this write we start with setting up stateless nodes, then we adept them to statefull using the "golden image" approach.
-Install Warewulf and poke around the shell ''wwsh''. Consult the stateless page. Lets set up stateless node ''b6''. The files have been loaded in the MySQL database. The "ww" files are template driven files whose contents will be populated by warewulf, like changing IPs and HWADDRes. I keep them in my centos-6 stateless chroot but use them in other chroots as well (/var/chgroot/centos-6/root/wwtemplates/).
 <code>
-wwsh node new b6 --netdev=eth0 \
+# Install and reboot
---hwaddr=00:00:00:00:00:00 --ipaddr=192.168.1.12 \
+yum groupinstall "Infiniband Support"
---netmask=255.255.0.0  --network=255.255.0.0
+yum install infiniband-diags perftest qperf opensm
---groups=wwnodes
+chkconfig rdma on
-wwsh node set b6 --netdev=eth1 \
+# for openhpc
---hwaddr=00:00:00:00:00:00 --ipaddr=10.10.100.12 \
+yum install inifinipath-psm
---netmask=255.255.0.0  --network=255.255.0.0
-wwsh provision set b6 --fileadd passwd,shadow,group
+yum install opensm
-wwsh provision set b6 --fileadd hosts,bashrc,profile
+chkconfig opensm on
-wwsh provision set b6 --fileadd network.ww,ifcfg-eth1.ww
-</code>
+yum install tcl tk
+yum install infiniband-diags
-As opposed to the stateless, which grabs it's OS content from the master node, in the "golden image" approach we are going to retrieve the content of a selected node (node b0 in this example) to the Warewul master (node petaltail in this case).
+shutdown -r now
-Set ''rsync'' to work between the two. Unmount any NFS file systems on the node. Adjust the exclusion selections in template file (ie after /home is unmounted on node I do want the mount point...scratch space, etc).
+# after reboot
+lsmod | grep ib
-  * /usr/lib/libexec/warewulf/wwmkchroot/golden-ststem.tmpl
+# and the output (ipoib is the important one)
-<code>
+ib_ipoib               80391  0
+ib_ucm                 12121  0
-# minder: all NFS file systems unmounted?
+ib_uverbs              39106  6 rdma_ucm,ib_ucm
+ib_umad                11802  8
-mkdir /var/chgroots/goldimages; cd /var/chgroots/goldimages
+ib_cm                  36996  3 ib_ipoib,ib_ucm,rdma_cm
+mlx4_ib               137138  1
-SOURCEADDR=b0 wwmkchroot golden-system /var/chroots/goldimages/b0.chroot
+ib_sa                  24060  5 ib_ipoib,rdma_ucm,rdma_cm,ib_cm,mlx4_ib
+ib_mad                 39811  4 ib_umad,ib_cm,mlx4_ib,ib_sa
+ib_core                81507  11 ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,mlx4_ib,ib_sa,ib_mad
+ib_addr                 8304  3 rdma_ucm,rdma_cm,ib_core
+ipv6                  335525  74 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,ib_ipoib,mlx4_ib,ib_addr
+mlx4_core             226123  2 mlx4_ib,mlx4_en
 </code>
-Next, modify the properties of the node to image.
+Connect the cable(s) to you ports and check that the reboot discovered the ports.
-[[http://warewulf.lbl.gov/trac/wiki/Recipes/StatefulProvisioning]]
 <code>
-wwsh object modify -s bootloader=sda b6
+# Check IB ports
-wwsh object modify -s diskformat=sda1,sda3 b6
+for i in `ls /sys/class/infiniband/*/ports/*/state`; do echo $i; cat $i; done
-wwsh object modify -s filesystems= \
-"mountpoint=/boot:dev=sda1:type=ext4:size=500, \
-dev=sda3:type=swap:size=2048, \
-mountpoint=/:dev=sda7:type=ext4:size=70000" \
-b6
-</code>
+# Your output may vary
+/sys/class/infiniband/mlx4_0/ports/1/state
-More on the ''filesystems'' option later. In this example I want to set up /, /boot and swap. Sizes in MB. Only one hard drive on this node (sda).
+: ACTIVE
+/sys/class/infiniband/mlx4_0/ports/2/state
-Next we need to get the node booted and trasnfer the VNFS image made from the node b0 contents. At this time look on your master node in /var/lib/mysql and make sure you have enough disk space (these VNFS images will be around 1 GB as observed). Also, back up that database with ''mysqldump''.
+: DOWN
-<code>
-# make the image, takes 10 minutes or so
-wwvnfs --chroot=/var/chroots/goldimages/b0.chroot
-# switch node to image VNFS
-wwsh provision set b6 --vnfs=b0.chroot
-# just to be prudent
-wwsh pxe update
-wwsh dhcp update
-service dhcpd restart
-# check the configs
-wwsh object print b6 -p :all
-wwsh provision list
-# next for provisioning (just to sure) on first PXE boot
-wwsh provision set --bootlocal=UNDEF b6
-# turn the node on
+# some test commands
+ibhosts
+iblinkinfo
+ibstatus
 </code>
-The console of the target node will now show the IP being assigned, the ''getvnfs'' process (be patient) and finally the root login. Check around the node for improvements, tweeks, then adjust and recreate the VNFS.
+Edit ''/etc/sysconfig/network-scripts/ifcfg-ib0'', something like
-After all that is done, disable provisioning so that the master ignores the PXE boot and the target node boots of local disk.
 <code>
-# ignore PXE boot
+DEVICE=ib0
-wwsh provision set --bootlocal=EXIT b6
+TYPE=InfiniBand
+UUID=eac9f00a-245d-4c88-b56f-1bcb6e6ed933
+ONBOOT=yes
+NM_CONTROLLED=no
+BOOTPROTO=none
+HWADDR=80:00:00:48:FE:80:00:00:00:00:00:00:00:02:C9:03:00:07:3B:DD
+CONNECTED_MODE=no
+IPADDR=10.11.103.31
+PREFIX=16
+DEFROUTE=node
+IPV4_FAILURE_FATAL=yes
+IPV6INIT=no
+NAME="System ib0"
-</code>
+# then start interface
+ifup ib0
-**filesystems**
+# check the route, then mount /home on this interface
-This is currently not working as expected. In my first attempts I'd specify sda1 (size=500), sda2 (size=2048, type=swap) and sda3 (size=fill) but what I end up with is a standard layout it looks like.  Any sizes are also ignored. So for now I just pick the ones I want (sda1, sda4, sda7).
+route
+# Your output may vary
-This also happens after I remove any UUID references in /etc/fstab, clean up /etc/mtab and clean any and all files in /dev/disk/by-uuid and rebuild VNFS.
+Kernel IP routing table
+Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
+.11.0.0       *               255.255.0.0     U     0      0        0 ib0
+link-local      *               255.255.0.0     U     1002   0        0 eth0
+link-local      *               255.255.0.0     U     1004   0        0 ib0
+.168.0.0     *               255.255.0.0     U     0      0        0 eth0
+default         greentail       0.0.0.0         UG    0      0        0 eth0
-<code>
+# home from sharptail
+.11.103.42:/home     /home   nfs  defaults 0 0
-fdisk -l
-Disk /dev/sda: 80.0 GB, 80026361856 bytes
-heads, 63 sectors/track, 9729 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Sector size (logical/physical): 512 bytes / 512 bytes
- I/O size (minimum/optimal): 512 bytes / 512 bytes
- Disk identifier: 0x000ce092
-    Device Boot      Start         End      Blocks   Id  System
- /dev/sda1   *           1          13      104391   83  Linux
- /dev/sda2              14        1543    12289725   83  Linux
- /dev/sda3            1544        1798     2048287+  82  Linux swap / Solaris
- /dev/sda4            1799        9729    63705757+   5  Extended
- /dev/sda5            1799        2053     2048256   83  Linux
- /dev/sda6            2054        2184     1052226   83  Linux
- /dev/sda7            2185        9729    60605181   83  Linux
-df -h
- Filesystem      Size  Used Avail Use% Mounted on
- /dev/sda7        57G  1.9G   53G   4% /
- tmpfs            12G     0   12G   0% /dev/shm
- /dev/sda1        95M   49M   42M  54% /boot
 </code>
-Warewulf 3.6.99 and CentOS 6.5
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools