\\ **[[cluster:0|Back]]** --- //[[hmeij@wesleyan.edu|Henk]] 2019/03/18 13:58// Note self * Host ohpc0-test + n29 + n31 did form a tiny openhpc/slurm/ww test cluster, redone -slurm * Host sharptail2 act as a Centos7 Warewulf server (host petaltail is Centos6 warewulf server) ===== OpenHPC 1.3.1 ===== Consult these pages for my earlier testing of OpenHPC. I simply copy&paste my way through these pages while consulting the recipe PDF for CentOS7.5 plus Warewulf. Any changes are logged on this page. - [[cluster:154|OpenHPC page 1]] - [[cluster:155|OpenHPC page 2]] - [[cluster:156|OpenHPC page 3]] - [[cluster:160|OpenHPC page 4]] First install SMS_server then deploy a stateless client. Select "Server with GUI". "eth0" will be private provisioning nic 192.168.1.220 and "eth1" will be public on wesleyan.edu. Enable ntp immediately. First do iptables, NetworkManager, firewalld, follow Page 1 listed above. Then follow recipe...problem in epel repo files: what worked for me was commenting the "metalink", and un-commenting the "baseurl" in /etc/yum.repos.d/epel.repo and removing the epel-testing.repo file. yum install http://build.openhpc.community/OpenHPC:/1.3/CentOS_7/x86_64/ohpc-release-1.3-1.el7.x86_64.rpm yum update ohpc-release epel-release yum install ohpc-base yum install ohpc-warewulf # skipping on ohpc-slurm-server on master # skipping on infiniband on master # skipping on omnipath on master /etc/warewulf/provision \\ device = enp4s0 \\ dynamic_hosts, hostfile, localdomain all "no" disable=o /etc/xinetd.d/tftp In warewulf config section httpd.conf is all set. \\ Restart/Enable all services listed in recipe.\\ /etc/warewulf/database-root \\ Set a mysql root password. \\ wwinit DATABASE \\ mysql -u root \\ set password for 'root'@'localhost' = PASSWORD('some_string'); Next build compute image for provisioning (first time, vanilla). Before I do that I will bring a second disk online (/dev/sdb) for /data to store CHROOTs in. wwmkchroot centos-7 /data/centos-7.chroot yum -y --installroot=/data/centos-7.chroot install ohpc-base-compute yum -y --installroot=/data/centos-7.chroot install tcl-devel yum -y --installroot=/data/centos-7.chroot install ntp yum -y --installroot=/data/centos-7.chroot install kernel + devel, firmware, headers yum -y --installroot=/data/centos-7.chroot install lmod-ohpc # skip ohpc-slurm-client yum -y --installroot=/data/centos-7.chroot groupinstall "Infiniband Support" yum -y --installroot=/data/centos-7.chroot install infinipath-psm chroot /data/centos-7.chroot systemctl enable rdma ==== Warewulf ==== - [[cluster:139|Warewulf Stateless]] - [[cluster:143|Warewulf Statefull]] - [[cluster:144|Warewulf Golden Image]] The only files I will import and associate with nodes are listed below. The others (passwd, shadow, group, hosts, fstab, bashrc and profile) will be copied in /etc by a post imaging process from my archive. **network.ww** # short node names NETWORK=yes HOSTNAME=%{NODENAME} **ifcfg-eth1.ww** DEVICE=enp8s0 BOOTPROTO=static ONBOOT=yes HWADDR=%{NETDEVS::ETH1::HWADDR} IPADDR=%{NETDEVS::ETH1::IPADDR} NETMASK=%{NETDEVS::ETH1::NETMASK} NETWORK=%{NETDEVS::ETH1::NETWORK} **ifcfg-ib0.ww** DEVICE=ib0 TYPE=InfiniBand ONBOOT=yes BOOTPROTO=none CONNECTED_MODE=no DEFROUTE=no IPADDR=%{NETDEVS::IB0::IPADDR} NETMASK=%{NETDEVS::IB0::NETMASK} NETWORK=%{NETDEVS::IB0::NETWORK} wwsh file import /data/templates/network.ww \ --path=/etc/sysconfig/network --name=network.ww -y wwsh file import /data/templates/ifcfg-eth1.ww \ --path=/etc/sysconfig/network-scripts/ifcfg-eth1 --name=ifcfg-eth1.ww -y wwsh file import /data/templates/ifcfg-ib0.ww \ --path=/etc/sysconfig/network-scripts/ifcfg-ib0 --name=ifcfg-ib0.ww -y Next prepare bootstrap and VNFS image. Configuration is still vanilla stateless 7.5. echo "drivers += updates/kernel" >> /etc/warewulf/bootstrap.conf wwbootstrap --chroot=/data/centos-7.chroot 3.10.0-862.9.1.el7.x86_64 wwsh bootstrap list BOOTSTRAP NAME SIZE (M) ARCH 3.10.0-862.9.1.el7.x86_64 28.4 x86_64 # vanilla wwvnfs --chroot=/data/centos-7.chroot wwsh vnfs list VNFS NAME SIZE (M) ARCH CHROOT LOCATION centos-7.chroot 306.3 x86_64 /data/centos-7.chroot ==== Register ==== Now we will register the node and deploy the stateless image as a test. Via script. Too many typos otherwise. Keep these files in CHROOT/root/ so we don't loose them. # file deploy.sh #!/bin/bash # deploy a n33.chroot type server # templates are in /data/templates node=$1 hwaddr0=$2 ipaddr0=$3 hwaddr1=$4 ipaddr1=$5 ipaddri=$6 if [ $# != 6 ]; then echo "missing args: node hwaddr0 ipaddr0 hwaddr1 ipaddr1 ipaddri" exit fi wwsh object delete $node -y ; sleep 3 wwsh node new $node --netdev=eth0 \ --hwaddr=$hwaddr0 --ipaddr=$ipaddr0 \ --netmask=255.255.0.0 --network=255.255.0.0 -y wwsh node set $node --netdev=eth1 \ --hwaddr=$hwaddr1 --ipaddr=$ipaddr1 \ --netmask=255.255.0.0 --network=255.255.0.0 -y wwsh node set $node --netdev=ib0 \ --ipaddr=$ipaddri \ --netmask=255.255.0.0 --network=255.255.0.0 -y # database file imports must already have been performed wwsh provision set $node --fileadd network.ww,ifcfg-eth1.ww,ifcfg-ib0.ww -y wwsh object modify -s bootloader=sda $node -y wwsh object modify -s diskformat=sda1,sda3 $node -y wwsh object modify -s diskpartition=sda $node -y wwsh object modify -s filesystems="mountpoint=/boot:dev=sda1:type=ext4:size=512,dev=sda2:type=swap:size=32768,mountpoint=/:dev=sda3:type=ext4:size=+" $node -y # vanilla image wwsh provision set $node --vnfs=centos-7.chroot -y wwsh provision set $node --bootstrap=3.10.0-862.9.1.el7.x86_64 -y wwsh provision set --bootlocal=UNDEF $node -y wwsh pxe update wwsh dhcp update # cron turns them off at 4pm systemctl restart dhcpd systemctl restart httpd echo "wwsh provision set --bootlocal=EXIT $node -y" # file deploy.txt # n33.chroot type nodes, ASUS type servers with 4x K20 gpus n37 50:46:5D:E8:1F:A8 192.168.102.47 50:46:5D:E8:1F:A9 10.10.102.47 10.11.103.47 # more servers ... After that is setup we deploy with: //./deploy.sh `grep ^n37 deploy.txt`// and hook up a KVM link to observe what happens on console. First proxy boot we get an error on the mkbootable provision step "grub2-install not found". So I installed all sms_server grub2 packages in CHROOT. Rebuild vnfs. Second pxe boot error in same step is "grub-mkconfig failed". While researching this 10 mins after ''init'' counted down screen flickers and the n37 node login prompt appears. So not a fatal error? And what is grub vs grub2 doing in CentOS7? Grrh, forgot to put SSH authorized keys in CHROOT as PermitRootLogin is by default set to No. Rebuild vnfs, redeploy. The ten minute consiole delay seems to be related to the KVM/ASMB6-iKVM of this ASUS server. The stateless node is pingable soon after init exits and the network looks good. Since I'm not going to run stateless clients onward to golden image approach...[[cluster:171| Link to page]] ==== Older kernel ==== I remade my CHROOT based on a warewulf list thread I ran into. Like so: rm -rf centos-7.chroot wwmkchroot centos-7 /data/centos-7.chroot yum -y --installroot=/data/centos-7.chroot install ohpc-base-compute yum -y --installroot=/data/centos-7.chroot install tcl-devel yum -y --installroot=/data/centos-7.chroot erase kernel-headers # then, CentOS 7.3 yum -y --installroot=/data/centos-7.chroot install http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-3.10.0-514.el7.x86_64.rpm yum -y --installroot=/data/centos-7.chroot install http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm yum -y --installroot=/data/centos-7.chroot install lmod-ohpc yum -y --installroot=/data/centos-7.chroot groupinstall "Infiniband Support" yum -y --installroot=/data/centos-7.chroot install infinipath-psm wwbootstrap --chroot=/data/centos-7.chroot 3.10.0-514.el7.x86_64 wwvnfs --chroot=/data/centos-7.chroot # then change the deploy script boostrap This did not change the grub2/grub errors encountered with CentOS7.5 but still the stateless node login prompt appears. \\ **[[cluster:0|Back]]**