User Tools

Site Tools


cluster:170


Back

Henk 2019/03/18 13:58 Note self

  • Host ohpc0-test + n29 + n31 did form a tiny openhpc/slurm/ww test cluster, redone -slurm
  • Host sharptail2 act as a Centos7 Warewulf server (host petaltail is Centos6 warewulf server)

OpenHPC 1.3.1

Consult these pages for my earlier testing of OpenHPC. I simply copy&paste my way through these pages while consulting the recipe PDF for CentOS7.5 plus Warewulf. Any changes are logged on this page.

First install SMS_server then deploy a stateless client. Select “Server with GUI”. “eth0” will be private provisioning nic 192.168.1.220 and “eth1” will be public on wesleyan.edu. Enable ntp immediately.

First do iptables, NetworkManager, firewalld, follow Page 1 listed above.

Then follow recipe…problem in epel repo files: what worked for me was commenting the “metalink”, and un-commenting the “baseurl” in /etc/yum.repos.d/epel.repo and removing the epel-testing.repo file.

yum install http://build.openhpc.community/OpenHPC:/1.3/CentOS_7/x86_64/ohpc-release-1.3-1.el7.x86_64.rpm
yum update ohpc-release epel-release

yum install ohpc-base 
yum install ohpc-warewulf

# skipping on ohpc-slurm-server on master

# skipping on infiniband on master

# skipping on omnipath on master

/etc/warewulf/provision
device = enp4s0
dynamic_hosts, hostfile, localdomain all “no”

disable=o /etc/xinetd.d/tftp

In warewulf config section httpd.conf is all set.
Restart/Enable all services listed in recipe.

/etc/warewulf/database-root
Set a mysql root password.
wwinit DATABASE
mysql -u root
set password for 'root'@'localhost' = PASSWORD('some_string');

Next build compute image for provisioning (first time, vanilla). Before I do that I will bring a second disk online (/dev/sdb) for /data to store CHROOTs in.

wwmkchroot centos-7 /data/centos-7.chroot
yum -y --installroot=/data/centos-7.chroot install ohpc-base-compute
yum -y --installroot=/data/centos-7.chroot install tcl-devel
yum -y --installroot=/data/centos-7.chroot install ntp
yum -y --installroot=/data/centos-7.chroot install kernel + devel, firmware, headers
yum -y --installroot=/data/centos-7.chroot install lmod-ohpc

# skip ohpc-slurm-client

yum -y --installroot=/data/centos-7.chroot groupinstall "Infiniband Support"
yum -y --installroot=/data/centos-7.chroot install infinipath-psm
chroot /data/centos-7.chroot systemctl enable rdma

Warewulf

The only files I will import and associate with nodes are listed below. The others (passwd, shadow, group, hosts, fstab, bashrc and profile) will be copied in /etc by a post imaging process from my archive.

network.ww

# short node names
NETWORK=yes
HOSTNAME=%{NODENAME}

ifcfg-eth1.ww

DEVICE=enp8s0
BOOTPROTO=static
ONBOOT=yes
HWADDR=%{NETDEVS::ETH1::HWADDR}
IPADDR=%{NETDEVS::ETH1::IPADDR}
NETMASK=%{NETDEVS::ETH1::NETMASK}
NETWORK=%{NETDEVS::ETH1::NETWORK}

ifcfg-ib0.ww

DEVICE=ib0
TYPE=InfiniBand
ONBOOT=yes
BOOTPROTO=none
CONNECTED_MODE=no
DEFROUTE=no
IPADDR=%{NETDEVS::IB0::IPADDR}
NETMASK=%{NETDEVS::IB0::NETMASK}
NETWORK=%{NETDEVS::IB0::NETWORK}
wwsh file import /data/templates/network.ww \
--path=/etc/sysconfig/network --name=network.ww -y
wwsh file import /data/templates/ifcfg-eth1.ww \
--path=/etc/sysconfig/network-scripts/ifcfg-eth1 --name=ifcfg-eth1.ww -y
wwsh file import /data/templates/ifcfg-ib0.ww \
--path=/etc/sysconfig/network-scripts/ifcfg-ib0 --name=ifcfg-ib0.ww -y

Next prepare bootstrap and VNFS image. Configuration is still vanilla stateless 7.5.

echo "drivers += updates/kernel" >> /etc/warewulf/bootstrap.conf

wwbootstrap --chroot=/data/centos-7.chroot 3.10.0-862.9.1.el7.x86_64

wwsh bootstrap list
BOOTSTRAP NAME            SIZE (M)      ARCH
3.10.0-862.9.1.el7.x86_64 28.4          x86_64

# vanilla
wwvnfs --chroot=/data/centos-7.chroot

wwsh vnfs list
VNFS NAME            SIZE (M)   ARCH       CHROOT LOCATION
centos-7.chroot      306.3      x86_64     /data/centos-7.chroot

Register

Now we will register the node and deploy the stateless image as a test. Via script. Too many typos otherwise. Keep these files in CHROOT/root/ so we don't loose them.

# file deploy.sh

#!/bin/bash
# deploy a n33.chroot type server 
# templates are in /data/templates
node=$1
hwaddr0=$2
ipaddr0=$3
hwaddr1=$4
ipaddr1=$5
ipaddri=$6

if [ $# != 6 ]; then
        echo "missing args: node hwaddr0 ipaddr0 hwaddr1 ipaddr1 ipaddri"
        exit
fi

wwsh object delete $node -y ; sleep 3

wwsh node new $node --netdev=eth0 \
--hwaddr=$hwaddr0 --ipaddr=$ipaddr0 \
--netmask=255.255.0.0  --network=255.255.0.0 -y

wwsh node set $node --netdev=eth1 \
--hwaddr=$hwaddr1 --ipaddr=$ipaddr1 \
--netmask=255.255.0.0  --network=255.255.0.0 -y

wwsh node set $node --netdev=ib0 \
--ipaddr=$ipaddri \
--netmask=255.255.0.0  --network=255.255.0.0 -y

# database file imports must already have been performed
wwsh provision set $node --fileadd network.ww,ifcfg-eth1.ww,ifcfg-ib0.ww -y

wwsh object modify -s bootloader=sda $node -y
wwsh object modify -s diskformat=sda1,sda3 $node -y
wwsh object modify -s diskpartition=sda $node -y
wwsh object modify -s filesystems="mountpoint=/boot:dev=sda1:type=ext4:size=512,dev=sda2:type=swap:size=32768,mountpoint=/:dev=sda3:type=ext4:size=+" $node -y

# vanilla image
wwsh provision set $node --vnfs=centos-7.chroot -y
wwsh provision set $node --bootstrap=3.10.0-862.9.1.el7.x86_64  -y

wwsh provision set --bootlocal=UNDEF $node -y

wwsh pxe update
wwsh dhcp update
# cron turns them off at 4pm
systemctl restart dhcpd
systemctl restart httpd

echo "wwsh provision set --bootlocal=EXIT $node -y"


# file deploy.txt

# n33.chroot type nodes, ASUS type servers with 4x K20 gpus
n37 50:46:5D:E8:1F:A8 192.168.102.47 50:46:5D:E8:1F:A9 10.10.102.47 10.11.103.47
# more servers ...

After that is setup we deploy with: ./deploy.sh `grep ^n37 deploy.txt` and hook up a KVM link to observe what happens on console. First proxy boot we get an error on the mkbootable provision step “grub2-install not found”. So I installed all sms_server grub2 packages in CHROOT. Rebuild vnfs. Second pxe boot error in same step is “grub-mkconfig failed”. While researching this 10 mins after init counted down screen flickers and the n37 node login prompt appears. So not a fatal error? And what is grub vs grub2 doing in CentOS7? Grrh, forgot to put SSH authorized keys in CHROOT as PermitRootLogin is by default set to No. Rebuild vnfs, redeploy.

The ten minute consiole delay seems to be related to the KVM/ASMB6-iKVM of this ASUS server. The stateless node is pingable soon after init exits and the network looks good. Since I'm not going to run stateless clients onward to golden image approach… Link to page

Older kernel

I remade my CHROOT based on a warewulf list thread I ran into. Like so:

rm -rf centos-7.chroot
wwmkchroot centos-7 /data/centos-7.chroot
yum -y --installroot=/data/centos-7.chroot install ohpc-base-compute
yum -y --installroot=/data/centos-7.chroot install tcl-devel
yum -y --installroot=/data/centos-7.chroot erase kernel-headers

# then, CentOS 7.3

yum -y --installroot=/data/centos-7.chroot install http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-3.10.0-514.el7.x86_64.rpm
yum -y --installroot=/data/centos-7.chroot install http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm

yum -y --installroot=/data/centos-7.chroot install lmod-ohpc
yum -y --installroot=/data/centos-7.chroot groupinstall "Infiniband Support"
yum -y --installroot=/data/centos-7.chroot install infinipath-psm

wwbootstrap --chroot=/data/centos-7.chroot 3.10.0-514.el7.x86_64
wwvnfs --chroot=/data/centos-7.chroot

# then change the deploy script boostrap

This did not change the grub2/grub errors encountered with CentOS7.5 but still the stateless node login prompt appears.


Back

cluster/170.txt · Last modified: 2019/03/18 18:11 by hmeij07