User Tools

Site Tools


cluster:140

This is an old revision of the document!



Back

2015 Summer Expansion

Fourteen Supermicro 1U servers were purchased each with dual 10 core processors. With hyper threading turned on that yields us 40 logical cores per 1U rack space or a total of 560 new logical cores. However, we maximized on cores and minimized our spending on memory. Each node has 32 GB memory per 40 logical cores, and average of 0.8 GB/core. Tiny!

Hence we create queue tinymem out of this hardware. They also have a tiny 16 GB DOM (device on motherboard, non spinning hard disk) so do not use /localscratch. The other scratch, /sanscratch, can be used, it is an NFS mount on this disk as is /home.

Below are instructions on how to monitor your jobs and make sure your jobs fit the purpose of this queue. It was acquired to accomodate all the “swarming” serial jobs, thousands of them. But parallel jobs can also be run if you fit the small memory foot print usage.

tinymem

Since the hp12 nodes also have a small memory foot print we can merge this into the tinymem queue as an experiment. If it does not work, we'll bring it back in original configuration. With hyper threading on these nodes have 12 GB of memory for 16 logical cores or 0.75 GB/core.

So the tinymem queue consists of two types of nodes; lets call them mwtmnodes for the new hardware (2015) and hptmnodes for the old hp12 nodes (2006). The new hardware will be faster (1.3x without hyper threading, and 1.35x with hyper threading) and on top of that will be able to handle 2.5x more jobs per unit of time.

In light of that I have created node specific resources

  • “tmfast” for the mwtmnodes (n46-n59)
  • “tmslow” for the hptmnodes (n1-n32)

In addition to this I have set preferences within the tinymem queue to first use the mwtmnodes then the hptmnodes. So if you do nothing and just submit to the queues that is what will happen. But you can control this if you wish, you can “consume” these node specific resources. There 40 consumables for the mwtmnodes and 16 consumables for the hptmnodes. Synatx is like this:

  • #BSUB -R “rusage[tmfast=1]”
  • #BSUB -R “rusage[tmslow=1]”

You need to request a consumable for each job slot, so if using say #BSUB -n 4 the '1' becomes a '4'. And your job will go PENDing when consumables are exhausted. When would you do this? For example if you do not wish to run on the hptmnodes and are ok with waiting, or if the fabulous new hardware is clogged full of jobs and you wish to immediately bypass those.

Doing nothing, that is not using requesting consuambels, is a perfect strategy too.

Today queues hp12 is closed while we wait for it to empty out. Then it disappears. — Meij, Henk 2015/06/17 15:07


Back

cluster/140.1434568085.txt.gz · Last modified: 2015/06/17 15:08 (external edit)