User Tools

Site Tools


cluster:125

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:125 [2014/02/19 09:14]
hmeij
cluster:125 [2014/02/26 15:32] (current)
hmeij [What Changes?]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-==== Dell Racks power off ====+Done! 
 + --- //[[hmeij@wesleyan.edu|Meij, Henk]] 2014/02/21 09:54// 
 + 
 +==== Dell Racks Power Off ====
  
 Soon (Feb/2014), we'll have to power down the Dell Racks and grab one L6-30 circuit supplying power to those racks and use it to power up the new Microway servers. Soon (Feb/2014), we'll have to power down the Dell Racks and grab one L6-30 circuit supplying power to those racks and use it to power up the new Microway servers.
Line 16: Line 19:
   * Each node is Infiniband enabled (meaning all our nodes are except the Blue Sky Studio, queue ''bss24''). ''/home'' and ''/sanscratch'' are served IPoIB.   * Each node is Infiniband enabled (meaning all our nodes are except the Blue Sky Studio, queue ''bss24''). ''/home'' and ''/sanscratch'' are served IPoIB.
  
-==== What changes? ====+==== What Changes? ====
  
 Queues: Queues:
  
   * elw, emw, ehw, ehwfd and imw disappear (224 job slots)   * elw, emw, ehw, ehwfd and imw disappear (224 job slots)
-  * mw256fd appears +  * mw256fd appears (256 job slots) 
-  * on both mw256 (n33-n37) and mw256fd (n38-n45) exclusive use is disabled (#BSUB -x)+  * on both mw256 (n33-n37) and mw256fd (n38-n45) exclusive use is disabled (#BSUB -x will not work)
   * the max number of jobs slots per node is 32 on ''mw256fd'' but 28 on ''mw256'' because the GPUs also need access to cores (4 per node for now) ... for now, it may be that max is going to be set to 8 if too many jobs grab too many job slots. You should benchmark your job to understand what is optimal.   * the max number of jobs slots per node is 32 on ''mw256fd'' but 28 on ''mw256'' because the GPUs also need access to cores (4 per node for now) ... for now, it may be that max is going to be set to 8 if too many jobs grab too many job slots. You should benchmark your job to understand what is optimal.
  
Line 32: Line 35:
 #BSUB -R "rusage[mem=X]" #BSUB -R "rusage[mem=X]"
 </code> </code>
 +
 +  * How do I find out how much memory I'm using? ssh node_name top -u your_name -b -n 1
  
 Gaussian: Gaussian:
Line 48: Line 53:
   * You can use the new queue ''mw256fd'' just like ''hp12'' or ''imw''   * You can use the new queue ''mw256fd'' just like ''hp12'' or ''imw''
   * For parallel programs you may use OpenMPI or MVApich, use the appropriate wrapper scripts to set up the environment for mpirun   * For parallel programs you may use OpenMPI or MVApich, use the appropriate wrapper scripts to set up the environment for mpirun
 +    * On ''mw256'' you may run either flavor of MPI with the appropriate binaries.
   * On ''mwgpu'' you must use MVApich2 when running the GPU enabled software (Amber, Gromacs, Lammps, Namd).   * On ''mwgpu'' you must use MVApich2 when running the GPU enabled software (Amber, Gromacs, Lammps, Namd).
-  * On ''mw256'' you may run either flavor of MPI with the appropriate binaries.+
  
 Scratch: Scratch:
Line 63: Line 69:
 Workshop: Workshop:
  
-  * We'll schedule one as soon as ''mw256fd'' has been deployed.+  * We'll schedule one as soon as ''mw256fd'' has been deployed. Feb 26th ST 509a 4-5 PM.
  
-==== What May Chenage? ====+==== What May Also Change? ====
  
 There is a significant need to run many, many programs that require very little memory (like in the order of 1-5 MB).  When such programs run they consume a job slot.  When many such programs consume many job slots, like on the large servers in the ''mw256'' or ''mw256fd'' queues lots of memory remains idle and inaccessible by other programs. There is a significant need to run many, many programs that require very little memory (like in the order of 1-5 MB).  When such programs run they consume a job slot.  When many such programs consume many job slots, like on the large servers in the ''mw256'' or ''mw256fd'' queues lots of memory remains idle and inaccessible by other programs.
Line 72: Line 78:
  
   * if there is no ‘sharing’ required the hyper-threaded node performs the same (that is the operating systems presents 16 cores but only up to 8 jobs are allowed to run, lets say by limiting the JL/H parameter of the queue)   * if there is no ‘sharing’ required the hyper-threaded node performs the same (that is the operating systems presents 16 cores but only up to 8 jobs are allowed to run, lets say by limiting the JL/H parameter of the queue)
- if there is ‘sharing’ jobs take a 44% speed penalty, however more of them can run, twice as many+  * if there is ‘sharing’ jobs take a 44% speed penalty, however more of them can run, twice as many
  
 So it appears that we could turn hyperthreading on and despite the nodes presenting 16 cores we could limit the number of jobs to 8 until the need arises to run many small jobs and then reset the limit to 16. So it appears that we could turn hyperthreading on and despite the nodes presenting 16 cores we could limit the number of jobs to 8 until the need arises to run many small jobs and then reset the limit to 16.
cluster/125.1392819293.txt.gz · Last modified: 2014/02/19 09:14 by hmeij