User Tools

Site Tools


cluster:216

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
cluster:216 [2022/04/29 10:47]
hmeij07 created
cluster:216 [2022/06/03 16:01]
hmeij07 [logger]
Line 87: Line 87:
 --netmask=255.255.0.0  --network=255.255.0.0 -y --netmask=255.255.0.0  --network=255.255.0.0 -y
  
-wwsh provision set $node --fileadd passwd,shadow,group -y +wwsh provision set $node --fileadd hosts,munge.key -y  
-wwsh provision set $node --fileadd hosts,munge.key -y +wwsh provision set $node --fileadd passwd,shadow,group -y  
-wwsh provision set $node --fileadd network.ww,ifcfg-eth1.ww -y+wwsh provision set $node --fileadd network.ww,ifcfg-eth1.ww -y 
  
-# write to disk? switch stateless/stateful 
-# install grub2 in $CHROOT for stateful 
-# wwsh object modify -s bootloader=sda $node -y 
-# wwsh object modify -s diskpartition=sda $node -y 
-# wwsh object modify -s diskformat=sda1,sda2 $node -y   
-# wwsh object modify -s filesystems="mountpoint=/boot:dev=sda1:type=ext4:size=150,mountpoint=/:dev=sda2:type=ext4:size=+" $node -y 
  
-# rocky8+stateless, comment out for golden image 
 +# wwsh provision set $node --bootstrap=4.18.0-348.12.2.el8_5.x86_64 -y 
 +# wwsh provision set $node --vnfs=rocky8.5 -y 
 + 
 +# stateful, comment out for golden image and stateless 
 +# install grub2 in $CHROOT first, rebuild vnfs 
 +# wwsh provision set --filesystem=gpt-n59  $node -y 
 +# wwsh provision set --bootloader=sda  $node -y 
 + 
 +# uncomment for golden image, comment out stateless and stateful
  wwsh provision set $node --bootstrap=4.18.0-348.12.2.el8_5.x86_64 -y  wwsh provision set $node --bootstrap=4.18.0-348.12.2.el8_5.x86_64 -y
- wwsh provision set $node --vnfs=rocky8.-y+ wwsh provision set $node --vnfs=n59.chroot -y 
 + wwsh provision set --filesystem=gpt-n59  $node -y 
 + wwsh provision set --bootloader=sda  $node -y 
  
 wwsh provision set --bootlocal=UNDEF $node -y wwsh provision set --bootlocal=UNDEF $node -y
-echo "only for stateful or golden image, after first boot issue" +echo "for stateful or golden image, after first boot issue" 
-echo "wwsh provision set --bootlocal=EXIT $node"+echo "wwsh provision set --bootlocal=normal $node -y"
  
 wwsh pxe update wwsh pxe update
Line 112: Line 118:
 systemctl restart tftp.socket systemctl restart tftp.socket
 # crontab will shutdown these services at 5pm # crontab will shutdown these services at 5pm
 +
  
 # execute the script # execute the script
Line 146: Line 153:
 yum --installroot=/opt/ohpc/admin/images/rocky8.5 install grub2  yum --installroot=/opt/ohpc/admin/images/rocky8.5 install grub2 
 touch /opt/ohpc/admin/images/rocky8.5/root/VNFS-TEST-WITH-GRUB2 touch /opt/ohpc/admin/images/rocky8.5/root/VNFS-TEST-WITH-GRUB2
 +
 +# build out stateful if desired
 +dnf --installroot $CHROOT install yum
 +dnf --installroot $CHROOT groupinstall "Server with GUI"
 +dnf --installroot $CHROOT install iptables-services
 +dnf --installroot $CHROOT clean all
  
 # rebuild vnfs # rebuild vnfs
Line 184: Line 197:
 tmpfs                          3.2G      3.2G   0% /run/user/0 tmpfs                          3.2G      3.2G   0% /run/user/0
  
-[root@n59 ~]# free -g +[root@n59 ~]# fdisk -l 
-              total        used        free      shared  buff/cache   available +Device       Start      End  Sectors  Size Type 
-Mem:             31                    29                              30 +/dev/sda1     2048     6143     4096    2M BIOS boot 
-Swap:                                 6+/dev/sda2     6144  1050623  1044480  510M EFI System 
 +/dev/sda3  1050624  2099199  1048576  512M Linux swap 
 +/dev/sda4  2099200 31277055 29177856 13.9G Linux filesystem 
  
 [root@n59 ~]# cat /etc/redhat-release  [root@n59 ~]# cat /etc/redhat-release 
Line 202: Line 218:
 ==== golden image ==== ==== golden image ====
  
-After stateful imaging we touch another file on imaged server then build a golden image. The touching of this new file represents customixing and testing the node. So for complex designs we might put the node temporarily on the internet and install nvidia drivers and toolkit. And perhaps install software that will optimize itself based on resources found (like gromacs/lammps probing gpu models for proper architecture). Then we build a golden image.+After stateful imaging we touch another file on imaged server then build a golden image. The touching of this new file represents customizing and testing the node prior to creating golden image. So for complex designs we might put the node temporarily on the internet and install nvidia drivers and toolkit for example. And perhaps install software that will optimize itself based on resources found (like gromacs/lammps probing gpu models for proper architecture). Then we build a golden image when everything works as expected. Hard to do in a CHROOT environment.
  
 <code> <code>
Line 225: Line 241:
 # view /etc/warewulf/vnfs.conf # view /etc/warewulf/vnfs.conf
 # the HYBRIDIZE section is commented out # the HYBRIDIZE section is commented out
 +
 +# /var/[log|spool|run] need to be removed from
 +/usr/libexec/warewulf/wwmkchroot/golden-tmpl
 +
 +# try on compute nodes
 +systemctl enable slurmd
  
 SOURCEADDR=n59 wwmkchroot golden-system \ SOURCEADDR=n59 wwmkchroot golden-system \
Line 266: Line 288:
 </code> </code>
  
 +Awesome. You also have a backup now. Image away. And no need for a dhcp server to always be at the ready. Linux will fix journal file system errors 99% of the time if rebooted from say a utility power loss.\\ Thank you Warewulf team.
  
 +I also see there are EFI and EFI + NVME filesystem examples in ''/etc/warewulf/filesystem/examples''
  
 +==== logger ====
 +
 +For some reason, after vnfs has compiled and deployed ''/dev/log'' is a socket file generating permission denied errors. Manual fix to apply, maybe put in ''/etc/rc.local'' in future
 +
 +<code>
 +
 +cd /dev
 +mv log log-orig
 +ln -s /run/systemd/journal/dev-log log
 +
 +logger test
 +journalctl --since=-1m
 +-- Logs begin at Thu 2022-05-12 10:46:49 EDT, end at Thu 2022-05-12 10:52:17 EDT. --
 +May 12 10:52:17 n59 root[3748]: test
 +
 +</code>
 +
 +==== queues left ====
 +
 +Not imaged will be nodes in these queues
 +
 +  * hp12 n[1-n32] Too old and failing fast
 +  * mwgpu n[33-n37] K20 gpus EOL, no cuda driver updates anymore beyond centos7
 +  * mw256fd n[38-n45] When warewulf starts imaging we disappear in a loop of "disks not ready"
 +
 +</code>
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
cluster/216.txt · Last modified: 2022/06/07 16:07 by hmeij07