cluster:220
NFSoRDMA
Previously used IPoIB, consult this page External Link
With newer hardware (storage and compute nodes) and an EDR Infiniband switch (expensive!) we will try NFSoRDMA.
https://enterprise-support.nvidia.com/s/article/howto-configure-nfs-over-rdma--roce-x
Remote Direct Memory Access supposedly gets better performance than IPoIB. Clients (compute nodes) fetch data directly from storage server's memory, so the remote storage “appears as if it is local”.
Our EDR Infiniband switch (100 GbE) is more expensive than an EDR Ethernet switch but obtains better performance.
Packages involved
- opensm, opensm provides an implementation of an InfiniBand Subnet Manager and Administration.
and
- infiniband-diags
- libibverbs
- librdmacm
- rdma-core (chkconfig –level 2345 rdma on)
- qperf
[root@astrostore ~]# ibstat
CA 'mlx5_0'
...
# eth0 & eth1 for warewulf provisions over eth0
[root@astrostore ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=UUID=05ade8ca-39db-42a8-b280-c5c835d7633f net.ifnames=0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
[root@astrostore ~]# lsmod | grep rdma
rpcrdma 307200 2
rdma_ucm 32768 0
rdma_cm 118784 5 rpcrdma,ib_srpt,ib_iser,ib_isert,rdma_ucm
iw_cm 53248 1 rdma_cm
ib_cm 114688 3 rdma_cm,ib_ipoib,ib_srpt
ib_uverbs 163840 2 rdma_ucm,mlx5_ib
ib_core 397312 12 rdma_cm,ib_ipoib,rpcrdma,ib_srpt,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
sunrpc 577536 20 nfsd,rpcrdma,auth_rpcgss,lockd,nfs_acl
[root@astrostore ~]# modinfo rpcrdma
filename: /lib/modules/4.18.0-425.3.1.el8.x86_64/kernel/net/sunrpc/xprtrdma/rpcrdma.ko.xz
alias: rpcrdma6
alias: xprtrdma
alias: svcrdma
license: Dual BSD/GPL
description: RPC/RDMA Transport
author: Open Grid Computing and Network Appliance, Inc.
rhelversion: 8.7
srcversion: 2BB7046D96C4B57D1B36DEE
depends: ib_core,sunrpc,rdma_cm
intree: Y
name: rpcrdma
vermagic: 4.18.0-425.3.1.el8.x86_64 SMP mod_unload modversions
sig_id: PKCS#7
signer: Rocky kernel signing key
sig_key: 54:B0:44:38:9B:6E:F7:43:B2:FD:8F:3B:8B:D3:69:22:26:2A:BE:87
sig_hashalgo: sha256
signature: A8:51:36:....
[root@astrostore ~]# cat /etc/exports
/lvm_data *(fsid=0,rw,async,insecure,no_root_squash)
[root@astrostore ~]# exportfs
/lvm_data <world>
# weird! yet nfsd is running, I observe 8 nfsd processes
[root@astrostore ~]# systemctl status nfs.service
Unit nfs.service could not be found.
# ohhh
[root@astrostore ~]# systemctl status nfs-server.service
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
Drop-In: /run/systemd/generator/nfs-server.service.d
└─order-with-mounts.conf
Active: active (exited) since Fri 2023-02-03 10:42:04 EST; 3 days ago
...
# port
sysctl -w fs.nfsd.portlist=20049
[root@astrostore ~]# cat /proc/fs/nfsd/portlist
rdma 20049
rdma 20049
tcp 2049
tcp 2049
# bring up interface
ip addr add 10.11.103.243/16 dev ib0
### then on compute node
10.11.103.243:/lvm_data /astrostore nfs defaults,proto=rdma,port=20049 0 0
# fini
with firewalld add this port
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/configuring_infiniband_and_rdma_networks/configuring-the-core-rdma-subsystem_configuring-infiniband-and-rdma-networks#enabling-nfs-over-rdma-on-an-nfs-server_configuring-the-core-rdma-subsystem
firewall-cmd --permanent --zone=public --add-port={20049/tcp,20049/udp}
firewall-cmd --permanent --zone=private --add-port={20049/tcp,20049/udp}
firewall-cmd --reload
systemctl restart nfs-server
the also add nfs, mountd and rpc-bind rules for ib0
cluster/220.txt · Last modified: by hmeij07
