This is an old revision of the document!
So after testing the memory performance of our clusters using Linpack, View Results, what about the file system access performance? There are many variables at play in this area, so a higher-level view is appropriate rather than a too detailed view.
In order to have comparative numbers, I choose the package IOZone which seemed to be used for this type of activities. IOZone performs many different tests including read, re-read, write, re-write, read-and-write, random mix, backwards reads and a few others. The whole mix then might be an appropriate comparative standard. As details spin out, we could focus on those that most reflecct our environment best; probably random mix.
IOZone was compiled for x86 64 bit Linux and staged in a tarball. That tarball would be copied to the disk housing the file system in question, unpacked, and with the vanilla out of the box “rule set” invoked with 'time ./iozone -a -g 12G > output.out'. Then the results were saved and graphed. The reason for 12GB as the file size limit to test at the upper bounds was set because cluster greeentail memory footprint across the board is that. I did not raise the file size limit above the memory footprint to avoid introducing another variable. You can read all about it External Link
As some of the tests IOZone performs put quite the load on the host (observed a single invocation to generate a load of 6), I ran IOZone with the LSF/Lava scheduler flag '-x' meaning exclusive use so no other programs would interfere.
So lets start with cluster petaltail/swallowtail.
The compute nodes have a single 80GB 7.2K RPM disk containing a /localscratch linux file system. IOZone took 6+ hours to finish doing all the tests. So: local disk, one spindle, 4 year old hardware, no raid. Used one of the ehw
queue nodes. So how does the fast disks on queue ehwfd
compare?
The computes nodes in the ehwfd
queue have directly attached to them via iSCSI a disk array. Each host has dedicated access to 230 GB provided by seven 36GB 15K RPM disks presented as /localscratch. So: local disks, 7 spindles, 4 year old hardware, raid 0. All seven disks working together at high speeds. This probably is the best IOZone performance we'll attain.
Our Netapp filer (filer3) provides 5 TB of home directory space, which is the same volume as /sanscratch, served up via a NFS mount. So now we have added a network component, IOZone will perform tests against a network mounted file system. The volume containing /sanscratch is composed of 24 1TB disks at 7.2K RPM speeds. The aggregate holding this volume, also holds other volumes. So: network NFS volume, 24 spindles, raid 50 (i believe). Not surprise, it is slow. About 1/3rd slower than a single local disk, that is another surprise.
The lets look at cluster greentail.
Like the in the petaltail cluster, cluster greentails compute nodes sport a single 160 GB disk spinning at 7.2K RPM. As above /localscratch is linux file system. So: local disk, one spindle, new hardware, no raid. Performance is double that of the petaltail nodes.
The head node on cluster greentail has a direct attached smart disk array connected via iSCSI. A logical volume of 24 1TB disks, spinning at 7.2K RPM, holds a volume of 5TB presented to compute nodes as an NFS mount /sanscratch. To add another variable, the NFS mount is done using an infiniband switch, all previous examples used gigabit ethernet switches. IPoIB as it is referred to, and operates at roughly 3x gigE, depends on a lot of things. So: network NFS volume over infiniband, 24 spindles, raid 6. Surprisingly, it betters the single spindle - local disk example above by roughly 20%.