User Tools

Site Tools



This is for experimental purposes only.
Proof of concept type of a thing.
Henk Meij 2007/09/28 11:38

The Story Of NAT

The cluster is served file systems from our NetApp Fabric Attached Storage Device. These file systems are NFS mounted on each compute node via the IO node. The NFS traffic is isolated to one of our private networks on the cluster, the subnet, running across a Cisco 7000 gigabit ethernet switch.

So what happens when you have another file system that you would like to make available on the back end compute nodes? From another cluster for example. One approach is to rely on network address translation (NAT). That approach is described here so i don't forget what we did.

Note that:

  • i'm not endorsing this approach at the current time until we test it further
  • any “opening up” of the private environment of the cluster introduces security risks
  • any “non-cluster” activities the compute nodes are involved in, potentially compromises their performance
  • i had no idea how this worked until Scott Knauert put it together

We start by grabbing a surplus computer and install linux on it
We add two NIC cards (in our case capable of 100e not gigE)
We run a CAT6 cable from a router port to the cluster (this is gigE)
And we named this new host NAT.

[root@NAT:~]uname -a
Linux NAT 2.6.18-5-686 #1 SMP Fri Jun 1 00:47:00 UTC 2007 i686 GNU/Linux


The NAT box will have two interface. One on our internal VLAN 1, just like our head node The other one will be on the NFS private network of the cluster. So basically we have:

  • eth1:
  • eth2:

This is defined in (Debian) /etc/network/interfaces (below). We choose VLAN 1 since we need to reach a file system hosted by in VLAN 90.

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# Wesleyan
auto eth1
iface eth1 inet static

# Cluster
auto eth2
iface eth2 inet static


Since we are opening up the backend of the cluster's private network, we need to clamp down on the access as much as possible on the NAT box. IP table chains limit the traffic from and to the private NFS network and the target host

But the whole intent of the NAT host is to provide a bridge between separate networks. So any packets that need to traverse this bridge are postrouted or forwarded across.

  • file /etc/init.d/nat

#EXTERNAL is the interface to the outside network.
#INTERNAL is the interface to the local network.

/sbin/depmod -a
/sbin/modprobe ip_tables
/sbin/modprobe iptable_nat
iptables --flush
iptables --table nat --flush
iptables --delete-chain
iptables --table nat --delete-chain

# added source and destination -hmeij
iptables --table nat --source --destination \
         --append POSTROUTING --out-interface $EXTERNAL -j MASQUERADE
iptables --source --destination \
         --append FORWARD --in-interface $INTERNAL -j ACCEPT

echo "1" > /proc/sys/net/ipv4/ip_forward

We can now test the setup by contacting the remote host and attempt to mount the remote file system:

[root@NAT:~]# ping
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=0 ttl=63 time=0.235 ms
64 bytes from ( icmp_seq=1 ttl=63 time=0.193 ms
64 bytes from ( icmp_seq=2 ttl=63 time=0.115 ms

--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.115/0.181/0.235/0.049 ms, pipe 2

[root@NAT:~]# mount /mnt
[root@NAT:~]# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
                      4.6T  1.5T  3.2T  31% /mnt
[root@NAT:~]# umount /mnt


On the compute nodes we now need to change the routing of the packets. Platform/OCS had already defined a default gateway that pointed back to swallowtail_nfs ( We now subsitute the NAT box private IP ( for the default gateway. In addition, Platform Support wants to make sure a gateway is defined for the other private network (192.168) so that any ssh callbacks can get resolved. The commands are (added to /etc/rc.local):

# add for nat box on administrative network
route add -net netmask gw dev eth0
# change default route set by platform/ocs
route add -net default netmask gw  dev eth1
route del -net default netmask gw dev eth1

and now our routing tables on the compute node looks like this:

[root@compute-1-1 ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface *      UH    0      0        0 eth0     swallowtail.loc   UG    0      0        0 eth0     *        U     0      0        0 eth0        *        U     0      0        0 eth1     *          U     0      0        0 eth1       *            U     0      0        0 eth0
default         UG    0      0        0 eth1

We should now be able to ping and mount the remote host and file system like we did on the NAT box. ONce we have established connectivity we can redefine the home directory for certain users.


The whole point of the NAT box is to make the remote home directories available to certain users. For this to work, the remote host must use the same UID/GID settings as the cluster does. The cluster uses the UID/GID settings from Active Directory (AD). Once the UID/GID settigns are synced, we change the home directory location for a user in question. After those changes, that file needs to be pushed out to the compute nodes and autofs restarted.

[root@swallowtail ~]# egrep 'hmeij|sknauert' /etc/auto.home

hmeij localhost:/export/home/users/hmeij

[root@swallowtail ~]# make -C /var/411
[root@swallowtail ~]# /etc/init.d/autofs restart

Once autofs is reatrted on both the head node and the compute node compute-1-1, we can force the automount to mount the remote home directory

[root@compute-1-1 ~]# cd ~sknauert
[root@compute-1-1 sknauert]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
                      4.6T  1.5T  3.2T  31% /home/sknauert


So lets write some files on compute node compute-1-1 in the remotely mounted home directory. Meaning, the packets have to flow over NFS to the NAT box which forwads the packets to the remote host. For comparison, lets do the same by writing to some other file systems.

#for i in 1024 10240 102400 1024000; do echo $i; time dd if=/dev/zero of=./out.$i  bs=1k count=$i; done
# ls -lh
-rw-r--r--  1 sknauert  s07          1M Sep 28 10:28 out.1024
-rw-r--r--  1 sknauert  s07         10M Sep 28 10:28 out.10240
-rw-r--r--  1 sknauert  s07        100M Sep 28 10:29 out.102400
-rw-r--r--  1 sknauert  s07       1000M Sep 28 10:41 out.1024000
Where 1024 10240 102400 1024000

These time recordings will wildly vary depending on competing resources ofcourse. The bottom three file systems are NFS mounted on compute node compute-1-1 from the io-node over gigabit ethernet. From the io-node the connection is 4 gigabits/sec fiber channel to the NetApp filer.

The connection in our test setup is limited by the 100e NIC cards in the NAT box. Also the remote host has a 100e link to VLAN 90. We should move these to gigE.


dmesg shows … which puzzle me …

eth1: Transmit error, Tx status register 82.
Probably a duplex mismatch. See Documentation/networking/vortex.txt
Flags; bus-master1, dirty 887391(15) current 887391(15)
Transmit list 00000000 vs. c7c30b60. 
0: @c7c30200 length 800005ea status 000105ea 


cluster/51.txt · Last modified: 2007/09/28 14:57 (external edit)