User Tools

Site Tools


cluster:171

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:171 [2018/08/20 09:11]
hmeij07 [Step 4]
cluster:171 [2018/08/20 09:55] (current)
hmeij07 [Step 5]
Line 143: Line 143:
 # final touches in CHROOT # final touches in CHROOT
 # edit deploy.sh (check filesystems, vnfs, bootstrap, UEFI stuff) # edit deploy.sh (check filesystems, vnfs, bootstrap, UEFI stuff)
 +
 +# copy your user and group base in passwd/shadow/group files in CHROOT/etc
 +# copy or edit bashrc and fstab in CHROOT/etc, add your NFS mounts
  
 # version # version
Line 157: Line 160:
 vanilla.chroot      533.9    /data/goldimages/vanilla.chroot vanilla.chroot      533.9    /data/goldimages/vanilla.chroot
  
- +# configure node, deploy
-# configure node, done on WW 3.6.99 +
 # assumes node pxe boots first # assumes node pxe boots first
 +
 cd /data/goldimages/vanilla.chroot/root cd /data/goldimages/vanilla.chroot/root
 ./deploy.sh `grep ^n37 deploy.txt` ./deploy.sh `grep ^n37 deploy.txt`
- 
 ssh n37 reboot ssh n37 reboot
  
 +# once the deploy is on it's way, imaging might take 5 mins or so
 +# for next node boot to be from local disk, on SMS_server issue
  
-# once the deploy is on it's way, imaging might take 5 mins or so 
-# on SMS_server issue 
 wwsh provision set --bootlocal=EXIT n37 -y wwsh provision set --bootlocal=EXIT n37 -y
- 
-# test a reboot of node n37 from local disk 
  
 </code> </code>
Line 177: Line 177:
 ==== Step 5 ===== ==== Step 5 =====
  
-So after imaging and reboot, what do we have? Definitely an imaged node, the partitions have shuffled. And our VERSION file came from the vnfs made from CHROOT. We also have eth0, eth1 and ib0.+So after imaging and reboot, what do we have? Definitely an imaged node, the partitions have shuffled around. And our VERSION file came from the vnfs made inside CHROOT. We also have eth0, eth1 and ib0.
  
 <code> <code>
Line 193: Line 193:
 /dev/sda3        65529135   234436544    84453705   83  Linux /dev/sda3        65529135   234436544    84453705   83  Linux
  
-[root@n37 etc]# ifconfig | grep 'inet '+[root@n37 ~]# ifconfig | egrep 'inet |UP'
 Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8). Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
 +eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.102.47  netmask 255.255.0.0  broadcast 192.168.255.255         inet 192.168.102.47  netmask 255.255.0.0  broadcast 192.168.255.255
 +eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.10.102.47  netmask 255.255.0.0  broadcast 10.10.255.255         inet 10.10.102.47  netmask 255.255.0.0  broadcast 10.10.255.255
 +ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
         inet 10.11.103.47  netmask 255.255.0.0  broadcast 10.11.255.255         inet 10.11.103.47  netmask 255.255.0.0  broadcast 10.11.255.255
 +lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.255.255.0         inet 127.0.0.1  netmask 255.255.255.0
 +
  
 </code> </code>
  
-Before the reboot we grabbed passwd/shadow/group/fstab and pulled to our global archive. Spliced in our user and group bases and added our NFS mounts to fstab (comment out UUID lines). A post boot script will  copy those in place as well as the tarball for 'usr/local' which will contain our K20 compiled software (hence the rsync exclude, keep vnfs small)Then all that is left to do is:+Then I have a post boot script that will upgrade the kernel. It also does some other actions like updating all users $HOME/.ssh/known_hosts files so that ssh does not choke on old signatures.
  
 <code> <code>
  
 +# make sure my mounts are all ok
 [root@n37 ~]# mount -a [root@n37 ~]# mount -a
  
-[root@n37 rpms]# cd /sanscratch/tmp/rpms +# staged kernel rpms 
-[root@n37 rpms]# yum install --tolerant *3.10.0*+[root@n37 ~]# cd /sanscratch/tmp/rpms 
 +[root@n37 rpms]# yum install --tolerant kernel*3.10.0*
  
 ... ...
Line 224: Line 231:
 ... ...
  
-# reboot from local disk, do not make a golden image of this, stay on 7.2 till 7.5 +# reboot from local disk, do not make a golden image of this
-# is fixed, only add non-kernel packages to vanilla.chroot if needed and re-image+
  
 </code> </code>
Line 231: Line 237:
  
  
-As far as I understand it, from CentOS 7.3 and on OHPC/Warewulf switch to GPT disk UEFI boot loader. This involves a boot manager (efibootmgr) that looks not at a master boot record (MBR) on a bootable partition but insteads looks at a file (grubx64.efi). To boot UEFI environment variables need to be supported and you do that in BIOS.  Read the OpenHPC thread "Stateful provisioning with warewulf does not work (ohpc 1.3.5)"You also need to add these packages+As far as I understand it, from CentOS 7.3 and higher versions, OHPC/Warewulf switch to GPT disk UEFI boot loader. This involves a boot manager (efibootmgr) that looks not at a master boot record (MBR) on a bootable partition but instead looks at a file (grubx64.efi). To boot using UEFIenvironment variables need to be supported and you do that in the BIOS.   You also need to add these packages to CHROOT
  
   * efibootmgr efivar-libs grub2-efi-x64 dosfstools   * efibootmgr efivar-libs grub2-efi-x64 dosfstools
  
-My hardware is an ASUS esc4000fdr G2 bought in 2013. In the BIOS boot CSM menus I can set boot filter options to "UEFi and Legacy" and for each type (PXE, Storage,...) I set "UEFI first". These setting allow the CentOS7.2 image to still Legacy PXE boot. Nice.+My hardware is an ASUS esc4000fdr G2 bought in 2013. In the BIOS boot CSM menus I can set boot filter options to "UEFi and Legacy" and for each type (PXE, Storage,...) I set "UEFI first". These setting still allow the CentOS 7.2 image to still Legacy PXE boot. Nice.
  
-But the CentOS 7.5 image...not. No matter what settings I use to boot UEFI I continue to receive EFI variables not supported. If I boot the 7.5 image Legacy wise I receive a "can find path to tmpfs error" in mkbootable. The owner of the list thread posted this solution. Look in the deploy script on how to handle that.+But if I build a CentOS 7.5 image...not. No matter what settings I use to boot UEFI I continue to receive "EFI variables not supported" in the adhoc-postscript step. If I boot the 7.5 image via Legacy boot I receive a "can not find path to tmpfs error" in the mkbootable step 
 + 
 +The owner of the list thread posted this solution for the UEFI problem. Look in the deploy script on how to handle that.  My work around was to deploy a 7.2 image using a Warewulf stand alone 3.6.99 installation, then update the node kernel. (building a 3.6.99 version [[cluster:139|Warewulf Stateless]]).
  
  
Line 247: Line 255:
 [[ -n "$ent" ]] && chroot "$NEWROOT" /usr/sbin/efibootmgr -b "${ent:4:4}" -B  [[ -n "$ent" ]] && chroot "$NEWROOT" /usr/sbin/efibootmgr -b "${ent:4:4}" -B 
 chroot "$NEWROOT" /usr/sbin/efibootmgr -c -d /dev/sda -l "\EFI\centos\grubx64.efi" -L CentOS-Warewulf chroot "$NEWROOT" /usr/sbin/efibootmgr -c -d /dev/sda -l "\EFI\centos\grubx64.efi" -L CentOS-Warewulf
 +
 +# Read the OpenHPC thread "Stateful provisioning with warewulf does not work (ohpc 1.3.5)".
  
 </code> </code>
Line 316: Line 326:
  
 ==== deploy.txt ==== ==== deploy.txt ====
- 
-File deploy.txt 
  
 <code> <code>
cluster/171.1534770697.txt.gz · Last modified: 2018/08/20 09:11 by hmeij07