User Tools

Site Tools


cluster:145


Back

IPoIB

Redoing our RHEL5.5 HP Proliant blade servers with CentOS 6.7 using Warewulf Golden Image provisioning.

Not quite there yet, but I'll document here how Infiniband was installed. These compute nodes are connect to a Voltaire interconnect, and aging quite a bit.

First install a vanilla basic server and load packages you need. Then add the packages for Infiniband.

# Install and reboot
yum groupinstall "Infiniband Support"
yum install infiniband-diags perftest qperf opensm
chkconfig rdma on

# for openhpc 
yum install inifinipath-psm

yum install opensm
chkconfig opensm on

yum install tcl tk
yum install infiniband-diags

shutdown -r now

# after reboot
lsmod | grep ib

# and the output (ipoib is the important one)

ib_ipoib               80391  0
ib_ucm                 12121  0
ib_uverbs              39106  6 rdma_ucm,ib_ucm
ib_umad                11802  8
ib_cm                  36996  3 ib_ipoib,ib_ucm,rdma_cm
mlx4_ib               137138  1
ib_sa                  24060  5 ib_ipoib,rdma_ucm,rdma_cm,ib_cm,mlx4_ib
ib_mad                 39811  4 ib_umad,ib_cm,mlx4_ib,ib_sa
ib_core                81507  11 ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,mlx4_ib,ib_sa,ib_mad
ib_addr                 8304  3 rdma_ucm,rdma_cm,ib_core
ipv6                  335525  74 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,ib_ipoib,mlx4_ib,ib_addr
mlx4_core             226123  2 mlx4_ib,mlx4_en

Connect the cable(s) to you ports and check that the reboot discovered the ports.

# Check IB ports
for i in `ls /sys/class/infiniband/*/ports/*/state`; do echo $i; cat $i; done

# Your output may vary
/sys/class/infiniband/mlx4_0/ports/1/state
4: ACTIVE
/sys/class/infiniband/mlx4_0/ports/2/state
1: DOWN

# some test commands
ibhosts
iblinkinfo
ibstatus

Edit /etc/sysconfig/network-scripts/ifcfg-ib0, something like

DEVICE=ib0
TYPE=InfiniBand
UUID=eac9f00a-245d-4c88-b56f-1bcb6e6ed933
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
HWADDR=80:00:00:48:FE:80:00:00:00:00:00:00:00:02:C9:03:00:07:3B:DD
CONNECTED_MODE=no
IPADDR=10.11.103.31
PREFIX=16
DEFROUTE=node
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System ib0"

# then start interface 
ifup ib0

# check the route, then mount /home on this interface

route

# Your output may vary

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.11.0.0       *               255.255.0.0     U     0      0        0 ib0
link-local      *               255.255.0.0     U     1002   0        0 eth0
link-local      *               255.255.0.0     U     1004   0        0 ib0
192.168.0.0     *               255.255.0.0     U     0      0        0 eth0
default         greentail       0.0.0.0         UG    0      0        0 eth0

# home from sharptail
10.11.103.42:/home     /home   nfs  defaults 0 0 


Back

cluster/145.txt · Last modified: 2017/04/05 15:22 by hmeij07