\\
**[[cluster:0|Back]]**
--- //[[hmeij@wesleyan.edu|Henk]] 2019/03/18 13:58//
Note self
* Host ohpc0-test + n29 + n31 did form a tiny openhpc/slurm/ww test cluster, redone -slurm
* Host sharptail2 act as a Centos7 Warewulf server (host petaltail is Centos6 warewulf server)
===== OpenHPC 1.3.1 =====
Consult these pages for my earlier testing of OpenHPC. I simply copy&paste my way through these pages while consulting the recipe PDF for CentOS7.5 plus Warewulf. Any changes are logged on this page.
- [[cluster:154|OpenHPC page 1]]
- [[cluster:155|OpenHPC page 2]]
- [[cluster:156|OpenHPC page 3]]
- [[cluster:160|OpenHPC page 4]]
First install SMS_server then deploy a stateless client. Select "Server with GUI". "eth0" will be private provisioning nic 192.168.1.220 and "eth1" will be public on wesleyan.edu. Enable ntp immediately.
First do iptables, NetworkManager, firewalld, follow Page 1 listed above.
Then follow recipe...problem in epel repo files: what worked for me was commenting the "metalink", and un-commenting the "baseurl" in /etc/yum.repos.d/epel.repo and removing the epel-testing.repo file.
yum install http://build.openhpc.community/OpenHPC:/1.3/CentOS_7/x86_64/ohpc-release-1.3-1.el7.x86_64.rpm
yum update ohpc-release epel-release
yum install ohpc-base
yum install ohpc-warewulf
# skipping on ohpc-slurm-server on master
# skipping on infiniband on master
# skipping on omnipath on master
/etc/warewulf/provision \\
device = enp4s0 \\
dynamic_hosts, hostfile, localdomain all "no"
disable=o /etc/xinetd.d/tftp
In warewulf config section httpd.conf is all set. \\
Restart/Enable all services listed in recipe.\\
/etc/warewulf/database-root \\
Set a mysql root password. \\
wwinit DATABASE \\
mysql -u root \\
set password for 'root'@'localhost' = PASSWORD('some_string');
Next build compute image for provisioning (first time, vanilla). Before I do that I will bring a second disk online (/dev/sdb) for /data to store CHROOTs in.
wwmkchroot centos-7 /data/centos-7.chroot
yum -y --installroot=/data/centos-7.chroot install ohpc-base-compute
yum -y --installroot=/data/centos-7.chroot install tcl-devel
yum -y --installroot=/data/centos-7.chroot install ntp
yum -y --installroot=/data/centos-7.chroot install kernel + devel, firmware, headers
yum -y --installroot=/data/centos-7.chroot install lmod-ohpc
# skip ohpc-slurm-client
yum -y --installroot=/data/centos-7.chroot groupinstall "Infiniband Support"
yum -y --installroot=/data/centos-7.chroot install infinipath-psm
chroot /data/centos-7.chroot systemctl enable rdma
==== Warewulf ====
- [[cluster:139|Warewulf Stateless]]
- [[cluster:143|Warewulf Statefull]]
- [[cluster:144|Warewulf Golden Image]]
The only files I will import and associate with nodes are listed below. The others (passwd, shadow, group, hosts, fstab, bashrc and profile) will be copied in /etc by a post imaging process from my archive.
**network.ww**
# short node names
NETWORK=yes
HOSTNAME=%{NODENAME}
**ifcfg-eth1.ww**
DEVICE=enp8s0
BOOTPROTO=static
ONBOOT=yes
HWADDR=%{NETDEVS::ETH1::HWADDR}
IPADDR=%{NETDEVS::ETH1::IPADDR}
NETMASK=%{NETDEVS::ETH1::NETMASK}
NETWORK=%{NETDEVS::ETH1::NETWORK}
**ifcfg-ib0.ww**
DEVICE=ib0
TYPE=InfiniBand
ONBOOT=yes
BOOTPROTO=none
CONNECTED_MODE=no
DEFROUTE=no
IPADDR=%{NETDEVS::IB0::IPADDR}
NETMASK=%{NETDEVS::IB0::NETMASK}
NETWORK=%{NETDEVS::IB0::NETWORK}
wwsh file import /data/templates/network.ww \
--path=/etc/sysconfig/network --name=network.ww -y
wwsh file import /data/templates/ifcfg-eth1.ww \
--path=/etc/sysconfig/network-scripts/ifcfg-eth1 --name=ifcfg-eth1.ww -y
wwsh file import /data/templates/ifcfg-ib0.ww \
--path=/etc/sysconfig/network-scripts/ifcfg-ib0 --name=ifcfg-ib0.ww -y
Next prepare bootstrap and VNFS image. Configuration is still vanilla stateless 7.5.
echo "drivers += updates/kernel" >> /etc/warewulf/bootstrap.conf
wwbootstrap --chroot=/data/centos-7.chroot 3.10.0-862.9.1.el7.x86_64
wwsh bootstrap list
BOOTSTRAP NAME SIZE (M) ARCH
3.10.0-862.9.1.el7.x86_64 28.4 x86_64
# vanilla
wwvnfs --chroot=/data/centos-7.chroot
wwsh vnfs list
VNFS NAME SIZE (M) ARCH CHROOT LOCATION
centos-7.chroot 306.3 x86_64 /data/centos-7.chroot
==== Register ====
Now we will register the node and deploy the stateless image as a test. Via script. Too many typos otherwise. Keep these files in CHROOT/root/ so we don't loose them.
# file deploy.sh
#!/bin/bash
# deploy a n33.chroot type server
# templates are in /data/templates
node=$1
hwaddr0=$2
ipaddr0=$3
hwaddr1=$4
ipaddr1=$5
ipaddri=$6
if [ $# != 6 ]; then
echo "missing args: node hwaddr0 ipaddr0 hwaddr1 ipaddr1 ipaddri"
exit
fi
wwsh object delete $node -y ; sleep 3
wwsh node new $node --netdev=eth0 \
--hwaddr=$hwaddr0 --ipaddr=$ipaddr0 \
--netmask=255.255.0.0 --network=255.255.0.0 -y
wwsh node set $node --netdev=eth1 \
--hwaddr=$hwaddr1 --ipaddr=$ipaddr1 \
--netmask=255.255.0.0 --network=255.255.0.0 -y
wwsh node set $node --netdev=ib0 \
--ipaddr=$ipaddri \
--netmask=255.255.0.0 --network=255.255.0.0 -y
# database file imports must already have been performed
wwsh provision set $node --fileadd network.ww,ifcfg-eth1.ww,ifcfg-ib0.ww -y
wwsh object modify -s bootloader=sda $node -y
wwsh object modify -s diskformat=sda1,sda3 $node -y
wwsh object modify -s diskpartition=sda $node -y
wwsh object modify -s filesystems="mountpoint=/boot:dev=sda1:type=ext4:size=512,dev=sda2:type=swap:size=32768,mountpoint=/:dev=sda3:type=ext4:size=+" $node -y
# vanilla image
wwsh provision set $node --vnfs=centos-7.chroot -y
wwsh provision set $node --bootstrap=3.10.0-862.9.1.el7.x86_64 -y
wwsh provision set --bootlocal=UNDEF $node -y
wwsh pxe update
wwsh dhcp update
# cron turns them off at 4pm
systemctl restart dhcpd
systemctl restart httpd
echo "wwsh provision set --bootlocal=EXIT $node -y"
# file deploy.txt
# n33.chroot type nodes, ASUS type servers with 4x K20 gpus
n37 50:46:5D:E8:1F:A8 192.168.102.47 50:46:5D:E8:1F:A9 10.10.102.47 10.11.103.47
# more servers ...
After that is setup we deploy with: //./deploy.sh `grep ^n37 deploy.txt`// and hook up a KVM link to observe what happens on console. First proxy boot we get an error on the mkbootable provision step "grub2-install not found". So I installed all sms_server grub2 packages in CHROOT. Rebuild vnfs. Second pxe boot error in same step is "grub-mkconfig failed". While researching this 10 mins after ''init'' counted down screen flickers and the n37 node login prompt appears. So not a fatal error? And what is grub vs grub2 doing in CentOS7? Grrh, forgot to put SSH authorized keys in CHROOT as PermitRootLogin is by default set to No. Rebuild vnfs, redeploy.
The ten minute consiole delay seems to be related to the KVM/ASMB6-iKVM of this ASUS server. The stateless node is pingable soon after init exits and the network looks good. Since I'm not going to run stateless clients onward to golden image approach...[[cluster:171| Link to page]]
==== Older kernel ====
I remade my CHROOT based on a warewulf list thread I ran into. Like so:
rm -rf centos-7.chroot
wwmkchroot centos-7 /data/centos-7.chroot
yum -y --installroot=/data/centos-7.chroot install ohpc-base-compute
yum -y --installroot=/data/centos-7.chroot install tcl-devel
yum -y --installroot=/data/centos-7.chroot erase kernel-headers
# then, CentOS 7.3
yum -y --installroot=/data/centos-7.chroot install http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-3.10.0-514.el7.x86_64.rpm
yum -y --installroot=/data/centos-7.chroot install http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm
yum -y --installroot=/data/centos-7.chroot install lmod-ohpc
yum -y --installroot=/data/centos-7.chroot groupinstall "Infiniband Support"
yum -y --installroot=/data/centos-7.chroot install infinipath-psm
wwbootstrap --chroot=/data/centos-7.chroot 3.10.0-514.el7.x86_64
wwvnfs --chroot=/data/centos-7.chroot
# then change the deploy script boostrap
This did not change the grub2/grub errors encountered with CentOS7.5 but still the stateless node login prompt appears.
\\
**[[cluster:0|Back]]**