User Tools

Site Tools


cluster:51

Warning: Undefined array key 0 in /usr/share/dokuwiki/inc/html.php on line 1271

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1164

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1168

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1171

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1172

Warning: Undefined array key 0 in /usr/share/dokuwiki/inc/ChangeLog/ChangeLog.php on line 345

Warning: Undefined array key 1 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1454

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:51 [2007/09/28 14:57] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:0|Back]]**
  
 +This is for experimental purposes only. \\
 +Proof of concept type of a thing. \\
 + --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/09/28 11:38//
 +
 +
 +
 +
 +
 +
 +====== The Story Of NAT ======
 +
 +The cluster is served file systems from our **[[cluster:34#netapp_fas_3050c|NetApp Fabric Attached Storage Device]]**.  These file systems are NFS mounted on each compute node via the IO node.  The NFS traffic is isolated to one of our private networks on the cluster, the 10.3.1.xxx subnet, running across a Cisco 7000 gigabit ethernet switch.
 +
 +So what happens when you have another file system that you would like to make available on the back end compute nodes?  From another cluster for example.  One approach is to rely on network address translation (NAT).  That approach is described here so i don't forget what we did.
 +
 +Note that:
 +
 +  * i'm not endorsing this approach at the current time until we test it further
 +  * any "opening up" of the private environment of the cluster introduces security risks
 +  * any "non-cluster" activities the compute nodes are involved in, potentially compromises their performance
 +  * i had no idea how this worked until Scott Knauert put it together
 +
 +We start by grabbing a surplus computer and install linux on it\\
 +We add two NIC cards (in our case capable of 100e not gigE)\\
 +We run a CAT6 cable from a router port to the cluster (this is gigE)\\
 +And we named this new host **''NAT''**.
 +
 +<code>
 +[root@NAT:~]uname -a
 +Linux NAT 2.6.18-5-686 #1 SMP Fri Jun 1 00:47:00 UTC 2007 i686 GNU/Linux
 +</code>
 +
 +
 +===== Interfaces =====
 +
 +The NAT box will have two interface.  One on our internal VLAN 1, just like our head node ''swallowtail.wesleyan.edu'' The other one will be on the NFS private network of the cluster.  So basically we have:
 +
 +  * eth1: 129.133.1.225 
 +  * eth2: 10.3.1.10
 +
 +This is defined in (Debian) ''/etc/network/interfaces'' (below).  We choose VLAN 1 since we need to reach a file system hosted by ''vishnu.phy.wesleyan.edu'' in VLAN 90.
 +
 +<code>
 +# This file describes the network interfaces available on your system
 +# and how to activate them. For more information, see interfaces(5).
 +
 +# The loopback network interface
 +auto lo
 +iface lo inet loopback
 +
 +# Wesleyan
 +auto eth1
 +iface eth1 inet static
 +address 129.133.1.225
 +netmask 255.255.255.0
 +gateway 129.133.1.1
 +
 +# Cluster
 +auto eth2
 +iface eth2 inet static
 +address 10.3.1.10
 +netmask 255.255.255.0
 +</code>
 +
 +
 +
 +===== ipTables =====
 +
 +Since we are opening up the backend of the cluster's private network, we need to clamp down on the access as much as possible on the NAT box.  IP table chains limit the traffic from and to the private NFS network 10.3.1.xxx and the target host 129.133.90.207.
 +
 +But the whole intent of the NAT host is to provide a bridge between separate networks.  So any packets that need to traverse this bridge are postrouted or forwarded across.  
 +
 +  * file ''/etc/init.d/nat''
 +
 +<code>
 +#!/bin/bash
 +
 +#EXTERNAL is the interface to the outside network.
 +EXTERNAL="eth1"
 +#INTERNAL is the interface to the local network.
 +INTERNAL="eth2"
 +
 +/sbin/depmod -a
 +/sbin/modprobe ip_tables
 +/sbin/modprobe iptable_nat
 +iptables --flush
 +iptables --table nat --flush
 +iptables --delete-chain
 +iptables --table nat --delete-chain
 +
 +# added source and destination -hmeij
 +iptables --table nat --source 10.3.1.0/24 --destination 129.133.90.207 \
 +         --append POSTROUTING --out-interface $EXTERNAL -j MASQUERADE
 +iptables --source 129.133.90.207 --destination 10.3.1.0/24 \
 +         --append FORWARD --in-interface $INTERNAL -j ACCEPT
 +
 +echo "1" > /proc/sys/net/ipv4/ip_forward
 +</code>
 +
 +We can now test the setup by contacting the remote host and attempt to mount the remote file system:
 +
 +<code>
 +[root@NAT:~]# ping vishnu.phys.wesleyan.edu
 +PING vishnu.phys.wesleyan.edu (129.133.90.207) 56(84) bytes of data.
 +64 bytes from vishnu.phys.wesleyan.edu (129.133.90.207): icmp_seq=0 ttl=63 time=0.235 ms
 +64 bytes from vishnu.phys.wesleyan.edu (129.133.90.207): icmp_seq=1 ttl=63 time=0.193 ms
 +64 bytes from vishnu.phys.wesleyan.edu (129.133.90.207): icmp_seq=2 ttl=63 time=0.115 ms
 +
 +--- vishnu.phys.wesleyan.edu ping statistics ---
 +3 packets transmitted, 3 received, 0% packet loss, time 2001ms
 +rtt min/avg/max/mdev = 0.115/0.181/0.235/0.049 ms, pipe 2
 +
 +[root@NAT:~]# mount vishnu.phys.wesleyan.edu:/raid/home /mnt
 +[root@NAT:~]# df -h /mnt
 +Filesystem            Size  Used Avail Use% Mounted on
 +vishnu.phys.wesleyan.edu:/raid/home
 +                      4.6T  1.5T  3.2T  31% /mnt
 +[root@NAT:~]# umount /mnt
 +</code>
 +
 +
 +
 +===== Routes =====
 +
 +On the compute nodes we now need to change the routing of the packets.  Platform/OCS had already defined a default gateway that pointed back to swallowtail_nfs (10.3.1.254).  We now subsitute the NAT box private IP (10.3.1.10) for the default gateway.  In addition, Platform Support wants to make sure a gateway is defined for the other private network (192.168) so that any ssh callbacks can get resolved. The commands are (added to ''/etc/rc.local''):
 +
 +<code>
 +# add for nat box on administrative network
 +route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.1.254 dev eth0
 +# change default route set by platform/ocs
 +route add -net default netmask 0.0.0.0 gw 10.3.1.10  dev eth1
 +route del -net default netmask 0.0.0.0 gw 10.3.1.254 dev eth1
 +</code>
 +
 +and now our routing tables on the compute node looks like this:
 +
 +<code>
 +[root@compute-1-1 ~]# route
 +Kernel IP routing table
 +Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
 +255.255.255.255 *               255.255.255.255 UH    0      0        0 eth0
 +192.168.1.0     swallowtail.loc 255.255.255.0   UG    0      0        0 eth0
 +192.168.1.0                   255.255.255.0            0        0 eth0
 +10.3.1.0        *               255.255.255.0            0        0 eth1
 +169.254.0.0                   255.255.0.0              0        0 eth1
 +224.0.0.0                     240.0.0.0                0        0 eth0
 +default         10.3.1.10       0.0.0.0         UG    0      0        0 eth1
 +</code>
 +
 +We should now be able to ''ping'' and ''mount'' the remote host and file system like we did on the NAT box.  ONce we have established connectivity we can redefine the home directory for certain users.
 +
 +
 +===== AutoFS =====
 +
 +The whole point of the NAT box is to make the remote home directories available to certain users.  For this to work, the remote host must use the same UID/GID settings as the cluster does.  The cluster uses the UID/GID settings from Active Directory (AD).  Once the UID/GID settigns are synced, we change the home directory location for a user in question.  After those changes, that file needs to be pushed out to the compute nodes and autofs restarted.
 +
 +<code>
 +
 +[root@swallowtail ~]# egrep 'hmeij|sknauert' /etc/auto.home
 +
 +hmeij localhost:/export/home/users/hmeij
 +sknauert vishnu.phys.wesleyan.edu:/raid/home/templarapheonix
 +
 +[root@swallowtail ~]# make -C /var/411
 +[root@swallowtail ~]# /etc/init.d/autofs restart
 +
 +</code>
 +
 +Once autofs is reatrted on both the head node and the compute node compute-1-1, we can force the automount to mount the remote home directory
 +<code>
 +[root@compute-1-1 ~]# cd ~sknauert
 +[root@compute-1-1 sknauert]# df -h .
 +Filesystem            Size  Used Avail Use% Mounted on
 +vishnu.phys.wesleyan.edu:/raid/home/templarapheonix
 +                      4.6T  1.5T  3.2T  31% /home/sknauert
 +</code>
 +
 +
 +
 +
 +===== Tests =====
 +
 +So lets write some files on compute node compute-1-1 in the remotely mounted home directory.  Meaning, the packets have to flow over NFS to the NAT box which forwads the packets to the remote host.  For comparison, lets do the same by writing to some other file systems.
 +
 +<code>
 +#for i in 1024 10240 102400 1024000; do echo $i; time dd if=/dev/zero of=./out.$i  bs=1k count=$i; done
 +# ls -lh
 +-rw-r--r--  1 sknauert  s07          1M Sep 28 10:28 out.1024
 +-rw-r--r--  1 sknauert  s07         10M Sep 28 10:28 out.10240
 +-rw-r--r--  1 sknauert  s07        100M Sep 28 10:29 out.102400
 +-rw-r--r--  1 sknauert  s07       1000M Sep 28 10:41 out.1024000
 +</code>
 +
 +^ Where ^ 1024 ^ 10240 ^ 102400 ^ 1024000 ^
 +|vishnu.phys:/home/sknauert/TEMP|0m0.868s|0m6.404s|1m12.993s|11m59.465s|
 +|/export/home/rusers/sknauert/TEMP|0m0.027s|0m0.160s|0m01.815s|00m20.484s|
 +|/sanscratch/TEMP|0m0.108s|0m0.452s|0m01.626s|00m27.664s|
 +|/localscratch/TEMP|0m0.005s|0m0.038s|0m00.370s|00m07.687s|
 +
 +These time recordings will wildly vary depending on competing resources ofcourse.  The bottom three file systems are NFS mounted on compute node ''compute-1-1'' from the ''io-node'' over gigabit ethernet.  From the ''io-node'' the connection is 4 gigabits/sec fiber channel to the NetApp filer.
 +
 +The connection in our test setup is limited by the 100e NIC cards in the NAT box.  Also the remote host has a 100e link to VLAN 90.  We should move these to gigE.
 +
 +
 +===== Errors =====
 +
 +''dmesg'' shows ... which puzzle me ...
 +
 +<code>
 +eth1: Transmit error, Tx status register 82.
 +Probably a duplex mismatch. See Documentation/networking/vortex.txt
 +Flags; bus-master1, dirty 887391(15) current 887391(15)
 +Transmit list 00000000 vs. c7c30b60. 
 +0: @c7c30200 length 800005ea status 000105ea 
 +...
 +</code>
 +
 +\\
 +**[[cluster:0|Back]]**
cluster/51.txt ยท Last modified: 2007/09/28 14:57 (external edit)