User Tools

Site Tools


cluster:125

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:125 [2014/02/13 14:46]
hmeij
cluster:125 [2014/02/26 15:32] (current)
hmeij [What Changes?]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-==== Dell Racks power off ====+Done! 
 + --- //[[hmeij@wesleyan.edu|Meij, Henk]] 2014/02/21 09:54// 
 + 
 +==== Dell Racks Power Off ====
  
 Soon (Feb/2014), we'll have to power down the Dell Racks and grab one L6-30 circuit supplying power to those racks and use it to power up the new Microway servers. Soon (Feb/2014), we'll have to power down the Dell Racks and grab one L6-30 circuit supplying power to those racks and use it to power up the new Microway servers.
  
-That leaves some spare L6-30 circuits (the Dell racks use 4 each), so we could contemplate grabbing two and powering up two more shelves of the Blue Sky Studio hardware.  That would double the Hadoop cluster and the bss24 queue when needed (total of 100 job slots), and offer access to 1.2 TB of memory.  This hardware is generally powered off when not in use.+That leaves some spare L6-30 circuits (the Dell racks use 4 each), so we could contemplate grabbing two and powering up two more shelves of the Blue Sky Studio hardware.  That would double the Hadoop cluster and the ''bss24'' queue when needed (total of 100 job slots), and offer access to 1.2 TB of memory.  This hardware is generally powered off when not in use.
  
-The new Microway hardware is identical to the GPU-HPC hardware we bought minus the GPUs. A total of 8 1U servers will offer+The new Microway hardware is identical to the GPU-HPC hardware we bought previously minus the GPUs. A total of 8 1U servers will offer
  
   * 256 GB of memory per node (2,048 GB total ... that's amazing because if you add the GPU nodes memory footprint, the total for the rack becomes 3,328 GB in 18U of rack space).   * 256 GB of memory per node (2,048 GB total ... that's amazing because if you add the GPU nodes memory footprint, the total for the rack becomes 3,328 GB in 18U of rack space).
-  * dual 8-core Intel Xeon chips with hyperthreading turned on, so each node presents 32 cores for a toal of 256 cores (job slots). These will be presented as queue ''mw256f''+  * dual 8-core Intel Xeon chips with hyperthreading turned on, so each node presents 32 cores for a total of 256 cores (job slots). These will be presented as queue ''mw256fd''
   * Each core is capable of doing 8 instructions per clock cycle and each core will have access to an average of 8 GB of memory   * Each core is capable of doing 8 instructions per clock cycle and each core will have access to an average of 8 GB of memory
   * Each node also has a 300 GB 15K RPM hard disk holding the operating system, swap and provides for a /localscratch of 175 GB, hence the ''fd'' of ''mw256fd'' queue name.  It is to be used just like ''ehwfd''.   * Each node also has a 300 GB 15K RPM hard disk holding the operating system, swap and provides for a /localscratch of 175 GB, hence the ''fd'' of ''mw256fd'' queue name.  It is to be used just like ''ehwfd''.
-  * Each node is Infiniband enabled (meaning all our nodes are except the Blue Sky Studio (queue ''bss24''and ''/home'' and ''/sanscratch'' are served IPoIB.+  * Each node is Infiniband enabled (meaning all our nodes are except the Blue Sky Studioqueue ''bss24'')''/home'' and ''/sanscratch'' are served IPoIB.
  
-==== What changes? ====+==== What Changes? ====
  
 Queues: Queues:
  
   * elw, emw, ehw, ehwfd and imw disappear (224 job slots)   * elw, emw, ehw, ehwfd and imw disappear (224 job slots)
-  * mw256fd appears +  * mw256fd appears (256 job slots) 
-  * on both mw256 and mw256fd exclusive use is disabled (#BSUB -x) +  * on both mw256 (n33-n37) and mw256fd (n38-n45) exclusive use is disabled (#BSUB -x will not work
-  * the max number of jobs slots per node is 32 on mw256fd but 28 on mw256 (because the GPUs also need access to cores (4 per node for now).+  * the max number of jobs slots per node is 32 on ''mw256fd'' but 28 on ''mw256'' because the GPUs also need access to cores (4 per node for now) ... for now, it may be that max is going to be set to 8 if too many jobs grab too many job slots. You should benchmark your job to understand what is optimal. 
 + 
 +Memory: 
 + 
 +  * Since fewer and fewer nodes are deployed in our cluster with large memory footprints, it becomes important to estimate how much memory you need (add 10-20%) and reserve that via the scheduler so your jobs do not crash. 
 + 
 +<code> 
 +#BSUB -R "rusage[mem=X]" 
 +</code> 
 + 
 +  * How do I find out how much memory I'm using? ssh node_name top -u your_name -b -n 1
  
 Gaussian: Gaussian:
  
-  * In order to force your gaussian threads onto the same node (since it is a forked program not a parallel program) you must the following stanza's:+  * In order to force your gaussian threads onto the same node (since it is a forked program not a parallel program), when using any of mw256 queues, you must use the following stanza's:
  
 <code> <code>
  
-#BSUB -n X (where is equal to or less than the max jobs per node)+#BSUB -n X (where is equal to or less than the max jobs per node)
 #BSUB -R "span[hosts=1]" #BSUB -R "span[hosts=1]"
  
Line 39: Line 52:
  
   * You can use the new queue ''mw256fd'' just like ''hp12'' or ''imw''   * You can use the new queue ''mw256fd'' just like ''hp12'' or ''imw''
-  * For parallel programs you may use OpenMPI or MVApich, use the wrapper scripts to set up the environment for mpirun+  * For parallel programs you may use OpenMPI or MVApich, use the appropriate wrapper scripts to set up the environment for mpirun 
 +    * On ''mw256'' you may run either flavor of MPI with the appropriate binaries.
   * On ''mwgpu'' you must use MVApich2 when running the GPU enabled software (Amber, Gromacs, Lammps, Namd).   * On ''mwgpu'' you must use MVApich2 when running the GPU enabled software (Amber, Gromacs, Lammps, Namd).
-  * On ''mw256'' you may run either flavor of MPI with the appropriate binaries.+
  
 Scratch: Scratch:
  
-  * On all nodes /sanscratch is always the same and job progress can be viewed from all "tail" login nodes. It is a 5 disk 5TB storage area for large jobs needing much disk space.+  * On all nodes /sanscratch is always the same and job progress can be viewed from all "tail" login nodes. It is a 5 disk 5TB storage area for large jobs needing much disk space. When using /sanscratch you need to stage your data and code on those disks and copy the results back to your directory before the job finishes.
   * On all nodes /localscratch is a local directory like /tmp.  It is tiny (50 GB) and should be used for file locking purposes if you need to do so.    * On all nodes /localscratch is a local directory like /tmp.  It is tiny (50 GB) and should be used for file locking purposes if you need to do so. 
-  * Only nodes on ''mw256fd'' sport a 15K hard disk and /localscratch is 175 GB (replacing the ''ehwfd'' functionality.+  * Only nodes on ''mw256fd'' sport a 15K hard disk and /localscratch is 175 GB (replacing the ''ehwfd'' functionality). 
 + 
 +Savings: 
 + 
 +  * 77% less energy is consumed including what's needed for the new hardware, amazing. 
 + 
 +Workshop: 
 + 
 +  * We'll schedule one as soon as ''mw256fd'' has been deployed. Feb 26th ST 509a 4-5 PM. 
 + 
 +==== What May Also Change? ==== 
 + 
 +There is a significant need to run many, many programs that require very little memory (like in the order of 1-5 MB).  When such programs run they consume a job slot.  When many such programs consume many job slots, like on the large servers in the ''mw256'' or ''mw256fd'' queues lots of memory remains idle and inaccessible by other programs. 
 + 
 +So we could enable hyperthreading on the nodes of the ''hp12'' queue and double the jobs slots (from 256 to 512).  Testing reveals that when hyperthreading is on  
 + 
 +  * if there is no ‘sharing’ required the hyper-threaded node performs the same (that is the operating systems presents 16 cores but only up to 8 jobs are allowed to run, lets say by limiting the JL/H parameter of the queue) 
 +  * if there is ‘sharing’ jobs take a 44% speed penalty, however more of them can run, twice as many 
 + 
 +So it appears that we could turn hyperthreading on and despite the nodes presenting 16 cores we could limit the number of jobs to 8 until the need arises to run many small jobs and then reset the limit to 16.
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/125.1392320809.txt.gz · Last modified: 2014/02/13 14:46 by hmeij