Done! — Meij, Henk 2014/02/21 09:54
Soon (Feb/2014), we'll have to power down the Dell Racks and grab one L6-30 circuit supplying power to those racks and use it to power up the new Microway servers.
That leaves some spare L6-30 circuits (the Dell racks use 4 each), so we could contemplate grabbing two and powering up two more shelves of the Blue Sky Studio hardware. That would double the Hadoop cluster and the bss24
queue when needed (total of 100 job slots), and offer access to 1.2 TB of memory. This hardware is generally powered off when not in use.
The new Microway hardware is identical to the GPU-HPC hardware we bought previously minus the GPUs. A total of 8 1U servers will offer
mw256fd
fd
of mw256fd
queue name. It is to be used just like ehwfd
.bss24
). /home
and /sanscratch
are served IPoIB.Queues:
mw256fd
but 28 on mw256
because the GPUs also need access to cores (4 per node for now) … for now, it may be that max is going to be set to 8 if too many jobs grab too many job slots. You should benchmark your job to understand what is optimal.Memory:
#BSUB -R "rusage[mem=X]"
Gaussian:
#BSUB -n X (where X is equal to or less than the max jobs per node) #BSUB -R "span[hosts=1]"
MPI:
mw256fd
just like hp12
or imw
mw256
you may run either flavor of MPI with the appropriate binaries.mwgpu
you must use MVApich2 when running the GPU enabled software (Amber, Gromacs, Lammps, Namd).Scratch:
mw256fd
sport a 15K hard disk and /localscratch is 175 GB (replacing the ehwfd
functionality).Savings:
Workshop:
mw256fd
has been deployed. Feb 26th ST 509a 4-5 PM.
There is a significant need to run many, many programs that require very little memory (like in the order of 1-5 MB). When such programs run they consume a job slot. When many such programs consume many job slots, like on the large servers in the mw256
or mw256fd
queues lots of memory remains idle and inaccessible by other programs.
So we could enable hyperthreading on the nodes of the hp12
queue and double the jobs slots (from 256 to 512). Testing reveals that when hyperthreading is on
So it appears that we could turn hyperthreading on and despite the nodes presenting 16 cores we could limit the number of jobs to 8 until the need arises to run many small jobs and then reset the limit to 16.