Differences

This shows you the differences between two versions of the page.

--- cluster:125 [2014/02/17 18:09]
hmeij [What changes?]
+++ cluster:125 [2014/02/19 14:14]
hmeij
@@ Line 23: / Line 23: @@
   * mw256fd appears
   * on both mw256 (n33-n37) and mw256fd (n38-n45) exclusive use is disabled (#BSUB -x)
-  * the max number of jobs slots per node is 32 on ''mw256fd'' but 28 on ''mw256'' because the GPUs also need access to cores (4 per node for now).
+  * the max number of jobs slots per node is 32 on ''mw256fd'' but 28 on ''mw256'' because the GPUs also need access to cores (4 per node for now) ... for now, it may be that max is going to be set to 8 if too many jobs grab too many job slots. You should benchmark your job to understand what is optimal.
+Memory:
+  * Since fewer and fewer nodes are deployed in our cluster with large memory footprints, it becomes important to estimate how much memory you need (add 10-20%) and reserve that via the scheduler so your jobs do not crash.
+<code>
+#BSUB -R "rusage[mem=X]"
+</code>
 Gaussian:
@@ Line 56: / Line 64: @@
   * We'll schedule one as soon as ''mw256fd'' has been deployed.
+==== What May Chenage? ====
+There is a significant need to run many, many programs that require very little memory (like in the order of 1-5 MB).  When such programs run they consume a job slot.  When many such programs consume many job slots, like on the large servers in the ''mw256'' or ''mw256fd'' queues lots of memory remains idle and inaccessible by other programs.
+So we could enable hyperthreading on the nodes of the ''hp12'' queue and double the jobs slots (from 256 to 512).  Testing reveals that when hyperthreading is on
+  * if there is no ‘sharing’ required the hyper-threaded node performs the same (that is the operating systems presents 16 cores but only up to 8 jobs are allowed to run, lets say by limiting the JL/H parameter of the queue)
+ - if there is ‘sharing’ jobs take a 44% speed penalty, however more of them can run, twice as many
+So it appears that we could turn hyperthreading on and despite the nodes presenting 16 cores we could limit the number of jobs to 8 until the need arises to run many small jobs and then reset the limit to 16.
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools