This is an old revision of the document!
The next step in the evolution of our HPCC platform involves a new primary login node (from cottontail
to cottontail2
, to be purchased in early 2021) with a migration to OpenHPC platform and the Slurm scheduler. Proposals for one head node plus 2 compute nodes for a test and learn setup. Vastly different compute nodes so Slurm resource discovery and allocation can be tested. Along with scheduler Faishare policy. A chance to test out the A100 gpu.
Switching to RJ45 10GBase-T network in this migration. And adopting CentOS 8 (possibly the Stream version as events unfold … CentOS Stream or Rocky Linux).
Whoooo! Check this out https://almalinux.org/
Also sticking to a single private network for scheduler and home directory traffic, at 10G, for each node in the new environment. The second 10G interface (onboot=no) could be brought up for future use in some scenario. Maybe a second switch for network redundancy. Keep private network 192.168.x.x for openlava/warewulf6 traffic, and private network 10.10.x.x for slurm/warewulf8 traffic, avoids conflicts.
The storage network is on 1G, wonder if we could upgrade this later as 10G network grows (options were 6x1G or 4x10G). Or we move to 10G by adding replication partner in 3 years and switching roles between TrueNAS/ZFS units. (LACP the 6x1G into 3x2G)
Lots of old compute nodes will remain on 1G network. Maybe the newest hardware (n79-n90 nodes with RTX20280S gpus) could be upgraded to 10G using PCI cards?
VendorA | VendorB | VendorC | Notes | |
---|---|---|---|---|
Head Node | ||||
Rack | 1U | 1U | 1U | |
Power | 1+1 | 1+1 | 1+1 | 208V |
Nic | 4x10GB | 2x1G,2x10G | 4x10G | B:4x10G on PCI? |
Rails | 26-33 | 25 | ? | |
CPU | 2×5222 | 2x6226R | 2×5222 | Gold, Gold, Gold |
cores | 2×4 | 2×16 | 2×4 | Physical |
ghz | 3.8 | 2.9 | 3.8 | |
ddr4 | 96 | 192 | 96 | gb |
hdd | 2x960G | 2x480G | 2×480 | ssd, ssd, ssd (raid1) |
centos | 8 | 8 | no | |
OpenHPC | no | yes | no | y=“best effort” |
CPU Compute Node | ||||
Rack | 1U | 2U | 1U | |
Power | 1+1 | 1 | 1+1 | 208V |
Nic | 2x10G | 2x1G,2x10G | 2x10G | B:4x10G on PCI? |
Rails | 26-33 | ? | ? | |
CPU | 2x6226R | 2x6226R | 2x6226R | Gold, Gold, Gold |
cores | 2×16 | 2×16 | 2×16 | Physical |
ghz | 2.9 | 2.9 | 2.9 | |
ddr4 | 192 | 192 | 192 | gb |
hdd | 2T | 480G | 2x2T | sata, ssd, sata |
centos | 8 | 8 | no | |
CPU-GPU Compute Node | ||||
Rack | 4U | 2U | 1U | |
Power | 1+1 | 1 | 1+1 | 208V |
Nic | 2x10G | 2x1G,2x10G | 2x10G | B:4x10G on PCI? |
Rails | 26-36 | ? | ? | |
CPU | 2x4210R | 2x4214R | 2x4210R | Silver, Silver, Silver |
cores | 2×10 | 2×12 | 2×10 | Physical |
ghz | 2.4 | 2.4 | 2.4 | |
ddr4 | 192 | 192 | 192 | gb |
hdd | 2T | 480G | 2x2T | sata, ssd, sata |
centos | 8 | 8 | 8 | with gpu drivers, toolkit |
GPU | 1xA100 | 1xA100 | 1xA100 | can hold 4, passive |
hbm2 | 40 | 40 | 40 | gb memory |
mig | yes | yes | yes | up to 7 vgpus |
sdk | ? | - | - | |
ngc | ? | - | - | |
Switch | add! | 8+1 | 16+2 | NEED 2 OF THEM? |
S&H | incl | tbd | tbd | |
Δ | +2.4 | +4.4 | +1.6 | target budget $k |
GFLOPS = #chassis * #nodes/chassis * #sockets/node * #cores/socket * GHz/core * FLOPs/cycle
Note that the use of a GHz processor yields GFLOPS of theoretical performance. Divide GFLOPS by 1000 to get TeraFLOPS or TFLOPS.
http://en.community.dell.com/techcenter/high-performance-computing/w/wiki/2329