User Tools

Site Tools


cluster:54

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:54 [2007/10/19 20:23] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:​0|Back]]**
  
 +
 +
 +====== Job Slots ======
 +
 +I was asked in the UUG meeting yesterday how one determines how many job slots are still available. ​ Turns out to be a tricky question. In CluMon you might observe one host with only one JOBPID running yet it is declared '​Full'​ by the scheduler. ​ This would be a parallel job claiming all job slots with the ''"​bsub -n N ..."''​ parameter. ​ Other hosts may be listing anywhere between 1 to 8 JOBPIDs.  ​
 +
 +There may be a mix of serial and parallel jobs.  So how do you find out how many job slots area still available?
 +
 +
 +
 +
 +
 +
 +
 +===== Refresher 1 =====
 +
 +Current members of our "​core"​ queues. ​ Our "​core"​ queues are identify as those light weight nodes on the Infiniband switch, those light weight nodes on the gigE Ethernet switch, and the heavy weight nodes (also on gigE ehternet switch).
 +|||
 +^ Queue Name: 16-ilwnodes ^ light weight nodes, Infiniband + GigE switches ^
 +| Host Names: compute-1-1 ... compute-1-16 |  all in rack #1  |
 +|||
 +^ Queue Name: 16-lwnodes ^ light weight nodes, GigE switch only ^
 +| Host Names: compute-1-17 ... compute-1-27 |  all in rack #1  |
 +| Host Names: compute-2-28 ... compute-2-32 |  all in rack #2  |
 +|||
 +^ Queue Name: 04-hwnodes ^ heavy weight nodes, GigE switch only ^
 +| Host Names: nfs-2-1 ... nfs-2-4 |  all in rack #2   |
 +|||
 +
 +
 +
 +
 +===== Refresher 2 =====
 +
 +Our nodes contain dual quad core processors. ​ So each node has 2 physical processors, and each processor has 4 cores. ​ Each node then has 8 cores. ​ These cores share access to either 4 Gb (light weight) or 16 Gb (heavy weight) of memory. ​ Although the terminology differs between applications,​ think of a core as a job slot.  The scheduler will schedule up to 8 jobs per node, then consider that node '​Full'​. ​ It is assumed that each job (read task) utilizes the core assigned by the operating system to its full potential.
 +
 +So despite Gaussian'​s parameter ''​Nprocs''​ or the BSUB parammeter ''​-n''​ (processors) or the mpirun parameter ''​-np''​ (number of processors) ... in all these cases they refer to cores which implies job slots. ​ This is a result of chipsets developing faster than the software. ​ Basically, in a configuration file for the scheduler we can assign the number of job slots per host, or type of host. 
 +
 +__Side Bar:​__ ​ i'm temporarily overloading the queue 16-lwnodes by defining 12 jobs slots to be available per host.  The reason for this is that i'm observing jobs with very little memory requirements particularly with regard to serial jobs.  So a host may have 8 jobs running but still have 3 Gb of memory free.  As long as the host is swapping memory lightly, we could increase the job throughput a bit.  I was told that some sites go as far as overloading 4:1 the slots per node ratio but i was advised not to push it beyond 2:1 unless our jobs are very homogeneous. ​ Which they are not.
 +
 +
 +
 +
 +===== Example 1 =====
 +
 +In this example, the node is flagged as '​Full'​ despite only showing one JOBPID. ​ When we query the scheduler for what is running, we find out at a single job has requested 8 job slots thereby ​ exhausting all job slots. ​ Note the information under EXEC_HOST.
 +
 +{{:​cluster:​example1.jpg}}
 +
 +<​code>​
 +
 +[root@swallowtail ~]# bjobs -m compute-1-1 -u all
 +
 +JOBID   ​USER ​   STAT  QUEUE      FROM_HOST ​  ​EXEC_HOST ​  ​JOB_NAME ​  ​SUBMIT_TIME
 +20548   ​qgu ​    ​RUN ​  ​idle ​      ​swallowtail 8*compute-1-1 run101 ​    Oct 17 20:06
 +
 +</​code>​
 +
 +
 +
 +===== Example 2 =====
 +
 +In this example, the node is flagged as '​Normal'​ and lists 6 JOBPIDs. ​ When we query the scheduler for what is running, we find out that 6 serial jobs are running. ​ Hence 2 more job slots are still available in a standard configuration (see side bar).
 +
 +{{:​cluster:​example2.jpg}}
 +
 +<​code>​
 +
 +[root@swallowtail ~]# bjobs -m compute-1-18 -u all
 +
 +JOBID   ​USER ​   STAT  QUEUE      FROM_HOST ​  ​EXEC_HOST ​  ​JOB_NAME ​  ​SUBMIT_TIME
 +12771   ​chsu ​   RUN   ​16-lwnodes swallowtail compute-1-18 dna-dimer/​m32/​t0.103-4 Aug 21 11:34
 +19686   ​chsu ​   RUN   ​16-lwnodes swallowtail compute-1-18 grand-canonical/​t0.170/​mu0.54 Sep 30 11:26
 +19699   ​chsu ​   RUN   ​16-lwnodes swallowtail compute-1-18 grand-canonical/​t0.140/​mu0.48 Sep 30 12:03
 +19874   ​chsu ​   RUN   ​16-lwnodes swallowtail compute-1-18 dna-GC/​m16/​t0.0925/​mu24.0 Oct  4 16:38
 +20107   ​chsu ​   RUN   ​16-lwnodes swallowtail compute-1-18 grand-canonical/​t0.110/​mu0.43 Oct 12 22:01
 +20116   ​chsu ​   RUN   ​16-lwnodes swallowtail compute-1-18 grand-canonical/​t0.100/​mu0.43 Oct 12 22:01
 +
 +</​code>​
 +
 +__Side Bar:__ Since we are currently defining 12 available job slots for this host there still are 12-6 = 6 job slots available. ​ This is one of the experimental "​overloaded"​ hosts i'm watching. ​ It's processors are currently working heavily with no swap activity. ​ Yet there is a whopping 3.7+ Gb of memory and swap available. The disk I/O rate is very low at 131 KB/​sec. ​ I'd like to see what happens with 12 identical serial jobs.
 +
 +<​code>​
 +
 +[root@swallowtail ~]# lsload -l compute-1-18
 +
 +HOST_NAME ​    ​status ​ r15s  r1m  r15m ut   ​pg ​  ​io ​ ls  it  tmp   ​swp ​  ​mem  ​
 +compute-1-18 ​    ​ok ​  ​6.0 ​  ​6.0 ​ 6.0  75%  6.8  131 0   ​9 ​  7088M 3782M 3712M
 +
 +
 +</​code>​
 +
 +
 +===== Example 3 =====
 +
 +In this example, the node is flagged as '​Full'​ and lists 3 JOBPIDs. ​ When we query the scheduler for what is running, we find out that a mixture of jobs are running. ​
 +
 +{{:​cluster:​example3.jpg}}
 +
 +<​code>​
 +
 +[root@swallowtail ~]# bjobs -m compute-1-6 -u all
 +
 +JOBID   ​USER ​    ​STAT ​ QUEUE   ​FROM_HOST ​  ​EXEC_HOST ​  ​JOB_NAME ​  ​SUBMIT_TIME
 +20567   ​wpringle RUN   ​idle ​   swallowtail 6*compute-1-13:​1*compute-1-6:​1*compute-1-7 run101 ​    Oct 18 10:43
 +20292   ​gng ​     RUN   ​idle ​   swallowtail compute-1-6 poincare3.bat Oct 17 11:16
 +20308   ​qgu ​     RUN   ​idle ​   swallowtail 6*compute-1-6 run101 ​    Oct 17 13:05
 +
 +</​code>​
 +
 + There is a parallel job (20567, 1*compute-1-6) that takes up one job slot on this host.  There is another parallel job (20308, 6*compute-1-6) that takes up 6 job slots on this host.  Plus one serial job is running (20292). ​ A total of 8 job slots, hence the host has no more job slots available.
 +
 +====== Availability ======
 +
 +So is there a way to get the total number of job slots available on a queue basis?
 +
 +Not easily from what i can tell.  We could write a script to calculate this for us. If that would be convenient, let me know.  However, that would only be useful information if no jobs were in a pending state.
 +
 +FIXME\\
 +I should hack up the Clumon page some day so that the node icons line themselves up in a convenient order for us: 16-ilnodes (1-16), 16-lwnode (17-32), 04-hwnodes (1-4).\\
 +
 +Fixed !! \\  --- //​[[hmeij@wesleyan.edu|Meij,​ Henk]] 2007/10/19 15:06// \\
 +
 +{{:​cluster:​example4.jpg|An ordered layout}}
 +
 +
 +\\
 +**[[cluster:​0|Back]]**
cluster/54.txt ยท Last modified: 2007/10/19 20:23 (external edit)