We will be bringing online 280 more physical cores (560 hyper threads) with Haswell-EP E2650v3 chips 2.3 Ghz with a turbo boost speed of 3.0 Ghz.
That's an 85% increase in job slots. Yea.
We are in need to address the problem of tens of thousands of small serial jobs swarming across our larger servers. In doing so these jobs tie up large chunks of memory they do not use and interfere with the scheduling of large parallel jobs (small serial jobs satisfy job prerequisites easily).
So the idea is to assess what we could buy in terms of large core density hardware (max cpu cores per U rack space) with small memory footprints (defined as 1 gb per physical core or less). Nodes can have tiny local disks for OS and local scratch (say 16-120 GB). /home
may not be mounted on these systems so input and output files need to be managed by the jobs and copied back and forth using scp
. The scheduler will be SLURM. The OS CentOS 6.x latest version.
Some testing results can be found here:
The expansion: lines below give an estimation of nr_nodes = int(expansion_budget/node_cost)
Customer shall be responsible for shipping charges (this is normal) and shall have its own insurance to cover risk during transit (this is definitely not!).