Differences

This shows you the differences between two versions of the page.

--- cluster:97 [2011/02/15 20:29]
hmeij
+++ cluster:97 [2012/02/16 19:09] (current)
hmeij
@@ Line 1: / Line 1: @@
 \\
 **[[cluster:0|Home]]**
+==== Summary ====
+The purpose of this testing is to find out how fast the storage systems respond either directly attached to compute nodes, or attached via ethernet (gigabit ethernet) or infiniband (SDR via queue imw or QDR via queue hp12).  When using infiniband interconnects we use IPoIB (IP traffic over infiniband interconnects which theoretically might be 3-4 times faster than ethernet).
+So, nothing beats directly attached storage ofcourse (scenario: fastlocal.dell.out below), the attached disk arrays on compute nodes in the ehwfd queue.  Each node is presented with 230 gb of dedicated disk space provided by seven 10K disks using Raid 0 (all drives read and write simultaneously).  IOZone suite finished in an hour.
+However, that queue may be a bottle neck (only 4 compute nodes in the ehwfd queue) or perhaps 230 GB is not enough (for 8 job slots).  So one alternative is the use of MYSANSCRATCH in your submit scripts.  MYSANSCRATCH refers to a directory made for you by the scheduler at location /sanscratch/JOBPID which is a Raid 5 filesystem of 5 TB provided by 5 disks spinning at 7.2K.  IOZone suite was done in 2 hrs 45 mins (scenario: san.hp.out).
+For an example of using MYSANSCRATCH, look at the bottom of this page.  You will have to stage your data in the directory provided and copy the results back to your home directory when finished.  The scheduler will remove the directory.
 ==== IOZone ====
@@ Line 12: / Line 22: @@
 IOZone was compiled for x86 64 bit Linux and staged in a tarball.  That tarball would be copied to the disk housing the file system in question, unpacked, and with the vanilla out  of the box "rule set" invoked with **'time ./iozone -a -g 12G > output.out'**.  Then the results were saved and graphed.  The reason for 12GB as the file size limit to test at the upper bounds was set because cluster greeentail memory footprint across the board is that.  I did not raise the file size limit above the memory footprint to avoid introducing another variable. You can read all about it [[http://www.iozone.org/docs/IOzone_msword_98.pdf|External Link]]
-As some of the tests IOZOne performs put quite the load on the host (observed a single invocation to generate a load of 6), I ran IOZone with the LSF/Lava scheduler flag '-x' meaning exclusive use so no other programs would interfere.
+As some of the tests IOZone performs put quite the load on the host (observed a single invocation to generate a load of 6), I ran IOZone with the LSF/Lava scheduler flag '-x' meaning exclusive use so no other programs would interfere.
 ==== Results ====
@@ Line 24: / Line 34: @@
   * fastlocal.dell.out: **real    64m36.519s** (or slighly over an hour)
-The computes nodes in the ''ehwfd'' queue have directly attached to them via iSCSI a disk array.  Each host has dedicated access to 230 GB provided by seven 36GB 15K RPM disks presented as /localscratch. So: local disks, 7 spindles, 4 year old hardware, radi 0.  All seven disks working together at high speeds.
+The computes nodes in the ''ehwfd'' queue have directly attached to them, via iSCSI, a disk array.  Each host has dedicated access to 230 GB provided by seven 36GB 15K RPM disks presented as /localscratch. So: local disks, 7 spindles, 4 year old hardware, raid 0.  All seven disks working together at high speeds. This probably is the best IOZone performance we'll attain.
+  * san.dell.out: **real    518m11.531s** (or slightly over 8 hours and 30 mins)
+Our Netapp filer (filer3) provides 5 TB of home directory space, which is the same volume as /sanscratch, served up via a NFS mount.  So now we have added a network component, IOZone will perform tests against a network mounted file system.  The volume containing /sanscratch is composed of 24 1TB disks at 7.2K RPM speeds. The aggregate holding this volume, also holds other volumes. So: network NFS volume, 24 spindles, raid 50 (i believe).  No surprise,  it is slow. About 1/3rd slower than the single local disk, that is another surprise.
+Then lets look at cluster greentail.
+  * local.hp.out:  **real    208m7.579s** (or almost 3 hours and 30 mins)
+Like the in the petaltail cluster, cluster greentail's compute nodes sport a single 160 GB disk spinning at 7.2K RPM. As above /localscratch is a linux file system.  So: local disk, one spindle, new hardware, no raid.  Performance is double that of the petaltail nodes, must have to be related to disk caching.
+  * san.hp.out: **real    163m25.761s**(or almost 2 hours and 45 mins)
+The head node on cluster greentail has a direct attached smart disk array connected via iSCSI.  A logical volume of 24 1TB disks, spinning at 7.2K RPM, holds a volume of 5TB presented to compute nodes as an NFS mount /sanscratch.  To add another variable, the NFS mount is done using an infiniband switch, all previous examples used gigabit ethernet switches.  IPoIB as it is referred to, and operates at roughly 3x gigE, depends on a lot of things. So: network NFS volume over infiniband, 24 spindles, raid 6.  Surprisingly, it betters the single spindle - local disk example above by roughly 20%.
+  * home.hp.out:  **real    179m46.708s** (or 3 hours)
+On cluster greentail, a separate logical volume presents /home.  This volume is comprised of 12 1TB disks at 7.2K RPM speeds.  Same as above in terms of NFS mount across infiniband.  Note that the disks involved for /home are different than those for /sanscratch.  As expected it falls slightly short of the sancratch volume performance but not by much.  However, as users exercise the /home volume this may become a larger gap.
+ * home.dell.out ** real    (4 hours and 8 mins)**
+This was running against greentail's /home mounted via gigabit ethernet Force 10 switch on the petaltail/swallowtail cluster (runnning on node c28).  So just one hour penalty versus running locally on greentail.  Not bad at all, will seriously speed up jobs on the Dell cluster then.
+==== Graphs ====
+IOZone generates lots of interesting graphs, whose interpretations elude me somewhat still.  But it is obvious in some graphs were anomalies exists;  at sudden thresholds the performance starts to nose dive.
+  * [[http://greentail.wesleyan.edu:81/iozone/petaltail/report_local|report_local]]
+  * [[http://greentail.wesleyan.edu:81/iozone/petaltail/report_fastlocal|report_fastlocal]]
+  * [[http://greentail.wesleyan.edu:81/iozone/petaltail/report_san|report_san]]
+  * [[http://greentail.wesleyan.edu:81/iozone/greentail/report_local|report_local]]
+  * [[http://greentail.wesleyan.edu:81/iozone/greentail/report_san|report_san]]
+  * [[http://greentail.wesleyan.edu:81/iozone/greentail/report_home|report_home]]
+==== Sample ====
+Using MYSANSCRATCH with gaussian jobs (you can use any queue but hp12 will be the fastest):
+<code>
+#!/bin/bash
+#BSUB -q hp12
+#BSUB -o out
+#BSUB -e err
+#BSUB -J test
+# job slots: change both lines, also inside gaussian.com
+#BSUB -n 8
+#BSUB -x
+# unique job scratch dirs
+MYSANSCRATCH=/sanscratch/$LSB_JOBID
+MYLOCALSCRATCH=/localscratch/$LSB_JOBID
+export MYSANSCRATCH MYLOCALSCRATCH
+# cd to remote working dir
+cd $MYSANSCRATCH
+pwd
+# environment
+export GAUSS_SCRDIR="$MYSANSCRATCH"
+export g09root="/share/apps/gaussian/g09root"
+. $g09root/g09/bsd/g09.profile
+# stage input data
+rm -rf ~/gaussian/err ~/gaussian/out*
+cp ~/gaussian/gaussian.com .
+# run
+time g09 < gaussian.com > gaussian.log
+# save results
+cp gaussian.log ~/gaussian/output.$LSB_JOBID
+</code>
 \\
 **[[cluster:0|Home]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools