\\ **[[cluster:0|Back]]** ==== Warewulf, ohpc 2.4 ==== There are other pages to view but this is my latest ... * [[cluster:171|Warewulf Golden Image]] ==== stateless ==== First we create templates ''network.ww'' and ''ifcfg-eth0.ww'' This node ''n59'' is bare metal with just a 16G usb stick attached to system board (DOM) to hold operating system. Legacy boot. # network.ww # short node names NETWORK=yes HOSTNAME=%{NODENAME} # ifcfg-eth1.ww DEVICE=eth1 BOOTPROTO=static ONBOOT=yes HWADDR=%{NETDEVS::ETH1::HWADDR} IPADDR=%{NETDEVS::ETH1::IPADDR} NETMASK=%{NETDEVS::ETH1::NETMASK} NETWORK=%{NETDEVS::ETH1::NETWORK} # import those templates wwsh file import \ /opt/ohpc/admin/images/rocky8.5/root/wwtemplates/network.ww \ --path=/etc/sysconfig/network --name=network.ww wwsh file import \ /opt/ohpc/admin/images/rocky8.5/root/wwtemplates/ifcfg-eth1.ww \ --path=/etc/sysconfig/network-scripts/ifcfg-eth1 --name=ifcfg-eth1.ww Next we build a deploy script, first the input file # deploy.txt # add nodes to image: nodename hwaddrof eth0 hwaddrof eth1 n59 0C:C4:7A:4F:0B:7C 192.168.102.69 0C:C4:7A:4F:0B:7D 10.10.102.69 And the script to deploy #!/bin/bash # FIX vnfs & bootstrap for appropriate node # CHECK disk to format sda? # deploy a chroot server via PXE golden image transfer # templates are always in stateless CHROOT/rocky8.5/root/wwtemplates # look at header deploy.txt node=$1 hwaddr0=$2 ipaddr0=$3 hwaddr1=$4 ipaddr1=$5 if [ $# != 5 ]; then echo "missing args: node hwaddr0 ipaddr0 hwaddr1 ipaddr1 " exit fi wwsh object delete $node -y sleep 3 wwsh node new $node --netdev=eth0 \ --hwaddr=$hwaddr0 --ipaddr=$ipaddr0 \ --netmask=255.255.0.0 --network=255.255.0.0 -y wwsh node set $node --netdev=eth1 \ --hwaddr=$hwaddr1 --ipaddr=$ipaddr1 \ --netmask=255.255.0.0 --network=255.255.0.0 -y wwsh provision set $node --fileadd hosts,munge.key -y wwsh provision set $node --fileadd passwd,shadow,group -y wwsh provision set $node --fileadd network.ww,ifcfg-eth1.ww -y # stateless, comment out for golden image # wwsh provision set $node --bootstrap=4.18.0-348.12.2.el8_5.x86_64 -y # wwsh provision set $node --vnfs=rocky8.5 -y # stateful, comment out for golden image and stateless # install grub2 in $CHROOT first, rebuild vnfs # wwsh provision set --filesystem=gpt-n59 $node -y # wwsh provision set --bootloader=sda $node -y # uncomment for golden image, comment out stateless and stateful wwsh provision set $node --bootstrap=4.18.0-348.12.2.el8_5.x86_64 -y wwsh provision set $node --vnfs=n59.chroot -y wwsh provision set --filesystem=gpt-n59 $node -y wwsh provision set --bootloader=sda $node -y wwsh provision set --bootlocal=UNDEF $node -y echo "for stateful or golden image, after first boot issue" echo "wwsh provision set --bootlocal=normal $node -y" wwsh pxe update wwsh dhcp update systemctl restart dhcpd systemctl restart httpd systemctl restart tftp.socket # crontab will shutdown these services at 5pm # execute the script ./deploy.sh n59 ... Next PXE boot the node and we'll observe a stateless launch (ie no hard disk) [root@n59 ~]# cat /etc/redhat-release Rocky Linux release 8.5 (Green Obsidian) [root@n59 ~]# df -h Filesystem Size Used Avail Use% Mounted on tmpfs 16G 1.3G 15G 9% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 9.8M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup 192.168.102.250:/opt/intel 708G 209G 499G 30% /opt/intel 192.168.102.250:/opt/ohpc/pub 708G 209G 499G 30% /opt/ohpc/pub tmpfs 3.2G 0 3.2G 0% /run/user/0 ==== stateful ==== To go stateful we need grub installed yum --installroot=/opt/ohpc/admin/images/rocky8.5 install grub2 touch /opt/ohpc/admin/images/rocky8.5/root/VNFS-TEST-WITH-GRUB2 # build out stateful if desired dnf --installroot $CHROOT install yum dnf --installroot $CHROOT groupinstall "Server with GUI" dnf --installroot $CHROOT install iptables-services dnf --installroot $CHROOT clean all # rebuild vnfs wwvnfs --chroot /opt/ohpc/admin/images/rocky8.5 # partition cp /etc/warewulf/filesystem/examples/gpt_example.cmds \ /etc/warewulf/filesystem/gpt.cmds # customize cp gpt.cmds gpt-n59.cmds # edit the file and change swap to mkpart primary linux-swap 513MiB 1025MiB mkpart primary ext4 1025MiB 100% # edit deploy script and change --filesystem=gpt-n59.cmds Re-execute the script so services start and dhcp/pxe files are updated.\\ PXE boot the node again.\\ Upon boot we view stateful partitions, then set bootlocal to normal.\\ [root@n59 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 7.3G 1.3G 5.7G 19% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 9.8M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2 486M 55M 402M 13% /boot 192.168.102.250:/opt/intel 708G 209G 499G 30% /opt/intel 192.168.102.250:/opt/ohpc/pub 708G 209G 499G 30% /opt/ohpc/pub tmpfs 3.2G 0 3.2G 0% /run/user/0 [root@n59 ~]# fdisk -l Device Start End Sectors Size Type /dev/sda1 2048 6143 4096 2M BIOS boot /dev/sda2 6144 1050623 1044480 510M EFI System /dev/sda3 1050624 2099199 1048576 512M Linux swap /dev/sda4 2099200 31277055 29177856 13.9G Linux filesystem [root@n59 ~]# cat /etc/redhat-release Rocky Linux release 8.5 (Green Obsidian) [root@n59 ~]# wwsh provision set --bootlocal=normal n59 -y [root@n59 ~]# touch BOOTLOCAL=NORMAL [root@n59 ~]# reboot # observe that new file ==== golden image ==== After stateful imaging we touch another file on imaged server then build a golden image. The touching of this new file represents customizing and testing the node prior to creating golden image. So for complex designs we might put the node temporarily on the internet and install nvidia drivers and toolkit for example. And perhaps install software that will optimize itself based on resources found (like gromacs/lammps probing gpu models for proper architecture). Then we build a golden image when everything works as expected. Hard to do in a CHROOT environment. [root@n59 ~]# touch VNFS-TEST-WITH-GRUB2-GOLDEN-IMAGE [root@n59 ~]# ll total 4 -rw-r--r-- 1 root root 0 Apr 20 14:22 VNFS-TEST -rw-r--r-- 1 root root 0 Apr 28 08:09 VNFS-TEST-WITH-GRUB2 -rw-r--r-- 1 root root 0 Apr 28 15:55 VNFS-TEST-WITH-GRUB2-GOLDEN-IMAGE drwxr-xr-x 2 root root 4096 Apr 28 08:06 wwtemplates # install software in /usr/local or /opt/ohpc/pub # customize and test functionality on node, then # on master node cd /var/chroots # or where you keep images # be sure to edit /usr/libexec/warewulf/wwmkchroot/golden-system.tmpl # add any excludes necessary (like any NFS mounts present, or umount) # view /etc/warewulf/vnfs.conf # the HYBRIDIZE section is commented out # /var/[log|spool|run] need to be removed from /usr/libexec/warewulf/wwmkchroot/golden-tmpl # try on compute nodes systemctl enable slurmd SOURCEADDR=n59 wwmkchroot golden-system \ /var/chroots/n59.chroot | tee /var/chroots/n59.log # for large images you might want to exclude say /usr/share (leave locale) # create a tar archive with vnfs image to overlay after imaging # rebuild [root@master]# touch /var/chroots/n59.chroot/root/VNFS-TEST-WITH-GRUB2-GOLDEN-IMAGE-CHROOT [root@master]# wwvnfs --chroot /var/chroots/n59.chroot [root@master]# wwsh vnfs list VNFS NAME SIZE (M) ARCH CHROOT LOCATION n59.chroot 571.1 x86_64 /var/chroots/n59.chroot rocky8.5 553.7 x86_64 /opt/ohpc/admin/images/rocky8.5 # bootstrap remains the same so edit deploy script # uncomment for golden image, comment out stateless and stateful wwsh provision set $node --bootstrap=4.18.0-348.12.2.el8_5.x86_64 -y wwsh provision set $node --vnfs=n59.chroot -y wwsh provision set --filesystem=gpt-n59 $node -y wwsh provision set --bootloader=sda $node -y # execute deploy script, pxe boot node # and there is the golden image deployed [root@n59 ~]# ll total 4 -rw-r--r-- 1 root root 0 Apr 20 14:22 VNFS-TEST -rw-r--r-- 1 root root 0 Apr 28 08:09 VNFS-TEST-WITH-GRUB2 -rw-r--r-- 1 root root 0 Apr 28 15:55 VNFS-TEST-WITH-GRUB2-GOLDEN-IMAGE -rw-r--r-- 1 root root 0 Apr 29 10:03 VNFS-TEST-WITH-GRUB2-GOLDEN-IMAGE-CHROOT drwxr-xr-x 2 root root 4096 Apr 28 08:06 wwtemplates # future boots from local disk wwsh provision set --bootlocal=normal n59 -y Awesome. You also have a backup now. Image away. And no need for a dhcp server to always be at the ready. Linux will fix journal file system errors 99% of the time if rebooted from say a utility power loss.\\ Thank you Warewulf team. I also see there are EFI and EFI + NVME filesystem examples in ''/etc/warewulf/filesystem/examples'' ==== logger ==== For some reason, after vnfs has compiled and deployed ''/dev/log'' is a socket file generating permission denied errors. Manual fix to apply, maybe put in ''/etc/rc.local'' in future cd /dev mv log log-orig ln -s /run/systemd/journal/dev-log log logger test journalctl --since=-1m -- Logs begin at Thu 2022-05-12 10:46:49 EDT, end at Thu 2022-05-12 10:52:17 EDT. -- May 12 10:52:17 n59 root[3748]: test ==== queues left ==== Not imaged will be nodes in these queues * hp12 n[1-n32] Too old and failing fast, centos 6 * mwgpu n[33-n37] K20 gpus EOL, no cuda driver updates anymore, centos7 * mw256fd n[38-n45] When warewulf starts imaging we disappear in a loop of "disks not ready", centos 6 \\ **[[cluster:0|Back]]**