This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
cluster:144 [2015/12/08 19:18] 127.0.0.1 external edit |
cluster:144 [2018/07/26 18:52] (current) hmeij07 [Deploying] |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ===== Warewulf | + | ===== Warewulf |
- | * [[http:// | + | Also read these pages and this page will make more sense: |
- | Get RPMs and install. | + | For some time now I have been looking for a provisioning tool. I've tried along the way ... |
- | < | + | * Project Kusu, now defunct, but a great, simple template driven system. No fancy gui. |
+ | * HP's CMU, also a great tool, golden image approach. The nice feature of CMU is that master node can delegate hundreds of node to be image by a designated compute node relieving the master node. | ||
+ | * Bright Computing, a very complex tool that takes over every config file imaginable. Simple tasks become very burdensome, never achieved traction with this tool. | ||
+ | * xCAT, the behemoth of open source provisioning tools and more. It does it all, which means a huge learning curve. | ||
+ | * [[http:// | ||
- | [root@petaltail ~]# wget -O / | + | The requirements of the provisioning tool were two fold: |
- | --2015-04-07 08: | + | |
- | Resolving warewulf.lbl.gov... 128.3.7.27 | + | |
- | Connecting to warewulf.lbl.gov|128.3.7.27|: | + | |
- | HTTP request sent, awaiting response... 200 OK | + | |
- | Length: 126 [text/ | + | |
- | Saving to: â/ | + | |
- | 100%[======================================================================================> | + | * My HPCC environment is flooded with small jobs that run for weeks to months (no wall time) but have small memory requirements (< 1GB). Thus I want to design stateless compute nodes, or virtual compute nodes, and frequently tailor the config & setup to the scientific needs (mostly non graphical, just CPU compute bound jobs, very little IO). |
+ | * The HPCC also encounters very large jobs (for us that is 16-32 cores with memory requirements | ||
- | 2015-04-07 08:45:56 (22.9 MB/s) - â/ | + | So I settled on Warewulf which does these two approaches and sports an active forum for questions. |
- | [root@petaltail ~]# yum install warewulf-common warewulf-cluster warewulf-provision | + | |
+ | Not finding much on the " | ||
- | Installed: | + | Install Warewulf and poke around the shell '' |
- | warewulf-cluster.x86_64 0:3.6-1.el6 | + | |
- | warewulf-provision.x86_64 0:3.6-1.el6 | + | |
- | Dependency Installed: | + | < |
- | dhcp.x86_64 12: | + | |
- | warewulf-vnfs.noarch 0:3.6-1.el6 | + | |
- | Complete! | + | wwsh node new b6 --netdev=eth0 \ |
+ | --hwaddr=00: | ||
+ | --netmask=255.255.0.0 | ||
+ | --groups=wwnodes | ||
- | </ | + | wwsh node set b6 --netdev=eth1 \ |
+ | --hwaddr=00: | ||
+ | --netmask=255.255.0.0 | ||
- | Get and install MySQL and set mysql user root's password. | + | wwsh provision |
+ | wwsh provision set b6 --fileadd hosts, | ||
+ | wwsh provision set b6 --fileadd network.ww, | ||
- | < | ||
- | | ||
- | [root@petaltail warewulf]# vi / | ||
- | [root@petaltail warewulf]# service mysqld status | ||
- | |||
- | mysql> set password for ' | ||
- | Query OK, 0 rows affected (0.00 sec) | ||
- | |||
- | [root@petaltail warewulf]# chmod o-r / | ||
- | |||
</ | </ | ||
- | In the provision config file I turned dynamic_hosts, hostfile and localdomain off (I'll manage those manually) and my private network is run over eth0 (192.168.0.0/ | + | As opposed to the stateless, which grabs it's OS content from the master node, in the " |
- | < | + | Set '' |
- | + | ||
- | [root@petaltail warewulf]# vi /etc/ | + | |
- | [root@petaltail warewulf]# vi /etc/ | + | |
- | + | ||
- | </ | + | |
- | Next comes a piece of mystery. When executing | ||
- | * Lethal error thrown by module: / | ||
- | * answer is here https:// | ||
- | * quoting | ||
< | < | ||
- | What it appears you'll need to do initially is build out a chroot | ||
- | directory. For example, for a Scientific Linux 6.x install, you'd do: | ||
- | | + | # minder: all NFS file systems unmounted? |
- | to store them at -- run wwmkchroot -h for the help) | + | # or add rsync excludes in |
+ | # /usr/libexec/ | ||
- | That will give the base chroot for the VNFS at /var/chroots/sl-6. | + | mkdir /var/chgroots/goldimages; cd / |
- | You'll then set the environment variable CHROOTDIR when you execute | + | |
- | wwinit. | + | |
- | # CHROOTDIR=/ | + | SOURCEADDR=b0 wwmkchroot golden-system |
- | I know when you use the ' | ||
- | chroot based upon the host OS. But none of the other wwinit scripts, | ||
- | by default, create one; So you will need to build the chroot and | ||
- | specify where the directory is. Then when the VNFS part of wwinit is | ||
- | ran, it will build the VNFS and import it into the datastore at that | ||
- | time, setting the one specified as the default VNFS in the | ||
- | configuration. | ||
</ | </ | ||
- | So we start by making the chroot directories, first we'll build a generic centos-6. Then we initialize | + | Next, modify |
+ | [[http://warewulf.lbl.gov/ | ||
< | < | ||
- | [root@petaltail ~]# wwmkchroot centos-6 / | + | wwsh object modify |
- | + | wwsh object modify -s diskformat=sda1, | |
- | [root@petaltail ~]# CHROOTDIR=/ | + | wwsh object modify -s filesystems= \ |
- | database: | + | "mountpoint=/boot:dev=sda1:type=ext4:size=500, \ |
- | database: | + | dev=sda3:type=swap:size=2048, \ |
- | database: | + | mountpoint=/:dev=sda7:type=ext4:size=fill" |
- | database: | + | b6 |
- | database: | + | |
- | wwsh: | + | |
- | wwsh: + wwsh quit OK | + | |
- | wwsh: + wwsh help OK | + | |
- | wwsh: + wwsh node new testnode0000 | + | |
- | wwsh: + wwsh node list OK | + | |
- | wwsh: + wwsh node delete testnode0000 | + | |
- | domain: | + | |
- | authfiles: | + | |
- | authfiles: | + | |
- | authfiles: | + | |
- | authfiles: | + | |
- | nfsd: Setting domain " | + | |
- | nfsd: + chkconfig nfs on OK | + | |
- | nfsd: + service nfs restart | + | |
- | nfsd: + exportfs -a OK | + | |
- | ntpd: | + | |
- | ntpd: + chkconfig ntpd on OK | + | |
- | ntpd: + service ntpd restart | + | |
- | ssh_keys: | + | |
- | ssh_keys: | + | |
- | ssh_keys: | + | |
- | ssh_keys: | + | |
- | ssh_keys: | + | |
- | tftp: + /sbin/ | + | |
- | tftp: + / | + | |
- | tftp: + / | + | |
- | bootstrap: | + | |
- | bootstrap: | + | |
- | vnfs: | + | |
- | vnfs: + wwvnfs -y --hybridpath=/ | + | |
+ | # see note below on filesystems.... | ||
</ | </ | ||
- | Next restart MySQL, httpd and xinetd. Open your firewall | + | More on the '' |
- | * edit /etc/sysconfig/iptables | + | Next we need to get the node booted and trasnfer the VNFS image made from the node b0 contents. At this time look on your master node in /var/lib/mysql and make sure you have enough disk space (these VNFS images will be around 1 GB as observed). Also, back up that database with '' |
< | < | ||
- | # local allow | + | # make the image, takes 10 minutes or so |
- | -A INPUT -i eth0 -d 192.168.0.0/16 -p tcp --dport 0:65535 -j ACCEPT | + | wwvnfs |
- | -A INPUT -i eth0 -d 192.168.0.0/16 -p udp --dport 0:65535 -j ACCEPT | + | |
- | </ | + | # switch node to image VNFS |
+ | wwsh provision set b6 --vnfs=b0.chroot | ||
- | Next restart some warewulf services. I had to edit the / | + | # Rajil adds for GPU |
+ | # Keep in mind that nouveau should be disabled, | ||
+ | wwsh provision set c038 --kargs=\ | ||
+ | " | ||
- | < | + | # just to be prudent |
+ | wwsh pxe update | ||
+ | wwsh dhcp update | ||
+ | service dhcpd restart | ||
- | [root@petaltail ~]# wwsh dhcp update | + | # check the configs |
- | Rebuilding | + | wwsh object print b6 -p :all |
- | Done. | + | wwsh provision list |
- | [root@petaltail ~]# wwsh pxe update | + | # next for provisioning (just to sure) on first PXE boot |
- | No nodes found | + | wwsh provision set --bootlocal=UNDEF b6 |
- | [root@petaltail ~]# wwsh | + | # turn the node on |
- | Warewulf> | + | |
- | Warewulf> | + | |
- | Are you sure you want to make the following changes to 1 node(s): | + | |
- | SET: BOOTSTRAP | + | </ |
- | SET: VNFS = centos-6 | + | |
- | Yes/No> y | + | The console of the target node will now show the IP being assigned, the '' |
- | Warewulf> | + | |
- | </ | + | After all that is done, disable provisioning so that the master ignores |
- | + | ||
- | Booting | + | |
< | < | ||
- | [root@petaltail ~]# wwsh file import /etc/passwd | + | # ignore PXE boot |
+ | wwsh provision set --bootlocal=EXIT b6 | ||
- | [root@petaltail ~]# wwsh file list | + | </code> |
- | dynamic_hosts | + | |
- | group : -rw-r--r-- 1 root root 6247 / | + | |
- | passwd | + | |
- | shadow | + | |
- | [root@petaltail ~]# wwsh provision set b[0-51] --fileadd passwd | + | **filesystems** |
- | [root@petaltail ~]# wwsh provision print | + | This is currently not working as expected. In my first attempts I'd specify sda1 (size=500), sda2 (size=2048, type=swap) and sda3 (size=fill) but what I end up with is a standard layout it looks like. Any sizes are also ignored. So for now I just pick the ones I want (sda1, sda3, sda7). |
- | #### b51.cluster ############################################################## | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | b51.cluster: | + | |
- | ... | + | |
- | </code> | + | Note: this problem turns out to be hardware related, it is not appearing on newer hardware |
+ | --- // | ||
- | Next build a hybrid VNFS, this is the way to add packages to the nodes. | + | |
+ | This also happens after I remove any UUID references in /etc/fstab, clean up /etc/mtab and clean any and all files in / | ||
< | < | ||
- | cd / | + | fdisk -l |
- | mkdir vnfs | + | |
- | vi etc/fstab (inside of chroot area, edit) | + | Disk /dev/sda: 80.0 GB, 80026361856 bytes |
- | 192.168.1.217:/var/chroots/centos-6 | + | 255 heads, 63 sectors/ |
+ | Units = cylinders of 16065 * 512 = 8225280 bytes | ||
+ | | ||
+ | I/O size (minimum/optimal): 512 bytes / 512 bytes | ||
+ | Disk identifier: 0x000ce092 | ||
- | wwvnfs --chroot | + | Device Boot Start |
- | Overwrite original: y | + | /dev/sda1 |
+ | /dev/sda2 14 1543 12289725 | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
- | # reboot node, observe mount, then add NTP to chgroot | + | df -h |
- | yum --tolerant --installroot | + | |
+ | /dev/sda7 57G 1.9G | ||
+ | | ||
+ | / | ||
- | vi etc/init.d/ntpd (inside of chroot, edit) | + | </code> |
- | # chkconfig: 35 58 74 | + | |
- | # then make the following links in etc/rc[3|5].d/S58ntpd pointing to ../ | + | Warewulf |
- | vi etc/ | + | ==== Part 2 ==== |
- | restrict default ignore | + | |
- | restrict 127.0.0.0 | + | |
- | server 192.168.1.217 | + | |
- | restrict 192.168.1.217 nomodify | + | |
- | wwvnfs --chroot | + | To avoid the problems detailed above I started over with a fresh node. First I installed CentOS 6.7 vanilla on the hard disk with partitions sda1 (/boot, 500 MB), sd2 (2048 MB, swap) and sda3 (/, size=fill). Since I had a new kernel now we need to make a new bootstrap image and provision it. |
- | VNFS ' | + | |
+ | < | ||
- | # reboot node | + | wwbootstrap --chroot=/ |
+ | wwsh provision set n22 --vnfs=n0.chroot --bootstrap=2.6.32-573.12.1.el6.x86_64 | ||
</ | </ | ||
- | Then build the node up: | + | Before we create |
- | * use yum to install the openlava 2.2.1 RPM (pulls in tcl) | + | < |
- | * copy the openlava config files into the centos-6 are | + | |
- | * use yum to install postfix | + | |
- | * add links in rc3.d and rc5.d | + | |
- | * remove sendmail links | + | |
- | * yum install perl | + | |
- | * yum install munge | + | |
- | * build RPMs from tar ball | + | |
- | * rebuild the VNFS and reboot | + | |
- | Sometimes you can edit the files in the chroot directly, sometimes you must modify the installtoot directly. | + | # / |
- | < | + | # comment out for golden image |
+ | #hybridize += / | ||
+ | #hybridize += / | ||
+ | #hybridize += / | ||
+ | #hybridize += / | ||
- | [root@petaltail ~]# cd / | + | </code> |
- | [root@petaltail chroots]# chroot centos-6 | + | |
- | [root@petaltail /]# pwd | + | |
- | / | + | |
- | [root@petaltail /]# mkdir / | + | |
- | [root@petaltail /]# chown munge:munge / | + | |
- | [root@petaltail /]# mkdir / | + | |
- | [root@petaltail /]# chown slurm:munge /var/log/slurm | + | |
- | chown: invalid user: `slurm: | + | |
- | # since the passwd|shadow|group | + | I also cleaned |
- | [root@petaltail /]# chown slurm:munge / | + | < |
- | [root@petaltail /]# exit | + | |
- | exit | + | |
- | [root@petaltail chroots]# ls | + | |
- | centos-6 | + | |
- | # outside edit commenting out rc.local directives making these dirs etc | + | # /etc/fstab, edited (clean also mtab and remove files / |
- | [root@petaltail chroots]# vi centos-6/etc/rc.local | + | tmpfs /dev/ |
+ | devpts | ||
+ | sysfs / | ||
+ | proc /proc | ||
+ | 10.11.103.42:/ | ||
+ | 10.11.103.42:/ | ||
- | # and don't forget | + | # / |
- | [root@petaltail chroots]# wwvnfs | + | # This file was written by Warewulf bootstrap (capability setup-filesystems) |
+ | (hd0) /dev/sda | ||
- | # reboot node | + | # /etc/ |
- | </code> | + | # This file was written by Warewulf bootstrap (capability setup-filesystems) |
+ | serial --speed= --unit= --word= --parity= | ||
+ | terminal_input console serial; terminal_output console serial | ||
+ | default 0 | ||
+ | timeout 10 | ||
+ | root (hd0,0) | ||
+ | title CentOS release 6.7 - 2.6.32-573.12.1.el6.x86_64 | ||
+ | kernel / | ||
+ | initrd / | ||
- | To build short hostnames you can create a template inside of the chroot environment. | + | </ |
- | < | + | Now when first provisioning happens the right partitions are created and the node imaged. With bootload=EXIT or simply shutting down the warewulf master dhcpd process, the node is now booting from local disk. |
- | #--- build file CHROOT/ | + | Yea. |
- | | + | |
- | | + | |
- | #--- end | + | |
- | # add that file (using wwsh provision) to the nodes. | + | ==== Deploying ==== |
- | [root@]# wwsh file import | + | As part of my deployment I edited out any device information in file '' |
- | | + | |
- | [root@]# wwsh provision set n[00-15] --fileadd=network.ww | + | Then we build a template file with node specs in it like so: |
+ | < | ||
+ | # HP blades reverses HWADDR (don't ask) | ||
+ | # use nic port bottom (no changes) | ||
+ | # set both ipaddr/ | ||
+ | # post edit | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # reboot (with bootload=EXIT) | ||
+ | | ||
+ | | ||
+ | ... | ||
</ | </ | ||
- | Second interface: create a template inside of the chroot environment. | + | My deploy script (give an entire line from above as arguments) |
< | < | ||
- | wwsh node set b49 --netdev=eth1 \ | + | #!/bin/bash |
- | --hwaddr=00: | + | |
- | --netmask=255.255.0.0 | + | |
- | #--- build file CHROOT/ | + | # deploy a n0.chroot node via PXE golden image transfer |
- | DEVICE=eth1 | + | # dynamic files are always in stateless |
- | BOOTPROTO=static | + | node=$1 |
- | ONBOOT=yes | + | hwaddr0=$2 |
- | HWADDR=%{NETDEVS:: | + | ipaddr0=$3 |
- | IPADDR=%{NETDEVS:: | + | hwaddr1=$4 |
- | NETMASK=%{NETDEVS:: | + | ipaddr1=$5 |
- | NETWORK=%{NETDEVS:: | + | ipaddri=$6 |
- | #--- end | + | |
- | # add that file (using wwsh provision) to the nodes. | + | if [ $# != 6 ]; then |
+ | echo " | ||
+ | | ||
+ | fi | ||
- | [root@]# | + | wwsh object delete $node -y |
- | --path=/ | + | sleep 3 |
- | [root@]# | + | wwsh node new $node --netdev=eth0 \ |
+ | --hwaddr=$hwaddr0 | ||
+ | | ||
- | </ | + | wwsh node set $node --netdev=eth1 \ |
+ | | ||
+ | | ||
- | Now, lets put it all together which can form the basis for a script. | + | wwsh node set $node --netdev=ib0 \ |
+ | | ||
+ | | ||
- | < | + | wwsh provision set $node --fileadd passwd, |
+ | wwsh provision set $node --fileadd hosts, | ||
+ | wwsh provision set $node --fileadd network.ww, | ||
- | # make sure it boots across network, alter BIOS settings | + | # note: no diskpartition, already exists and fails on this hardware, |
+ | # otherwise add diskpartion=sda so that " | ||
- | wwsh node new b6 --netdev=eth0 \ | + | wwsh object modify |
- | --hwaddr=00: | + | wwsh object modify |
- | --netmask=255.255.0.0 | + | |
- | --groups=wwnodes | + | |
- | wwsh node set b6 --netdev=eth1 \ | + | if [ "$node" |
- | --hwaddr=00:00:00:00:00:00 --ipaddr=10.10.100.12 \ | + | # golden images with 3 partitions |
- | --netmask=255.255.0.0 | + | wwsh object modify |
+ | | ||
+ | # hp blade 4 partitions | ||
+ | wwsh object modify | ||
+ | fi | ||
- | wwsh provision set b6 --fileadd passwd, | + | wwsh provision set $node --vnfs=n0.chroot |
- | wwsh provision set b6 --fileadd hosts, | + | wwsh provision set $node --bootstrap=2.6.32-573.12.1.el6.x86_64 -y |
- | wwsh provision set b6 --fileadd network.ww,ifcfg-eth1.ww | + | |
+ | wwsh provision set --bootlocal=UNDEF $node -y | ||
- | </ | + | wwsh pxe update |
+ | wwsh dhcp update | ||
+ | | ||
+ | | ||
+ | echo "now reboot: $node" | ||
- | Useful links | + | echo "wwsh provision set --bootlocal=EXIT $node -y" |
+ | |||
+ | |||
+ | </ | ||
- | * [[http:// | ||
- | * [[http:// | ||
- | * http:// | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |