User Tools

Site Tools


cluster:168

This is an old revision of the document!



Back

2018 GPU Expansion

Important notes … about GeForce GTX1080Ti

From Nvidia web site: Warranted Product is intended for consumer end user purposes only, and is not intended for datacenter use and/or GPU cluster commercial deployments (“Enterprise Use”). Any use of Warranted Product for Enterprise Use shall void this warranty.

From Exxact web site: Exxact AMBER Certified AMBER MD Workstations and Clusters, in addition to being numerically validated using custom GPU validation suites, also come with an optimized version of AMBER 16 that has been developed in a collaboration between lead AMBER developer Ross Walker (SDSC), NVIDIA and Exxact.

So it is a grey area if GTX1080Ti data center research use is warranted or not. Most quotes come with language that all warranty issues will be the handled between card owner and card issuer. If we go with consumer grade GTX we should add some self spares…dev budget? $700 apiece. Tesla P100 is enterprise level, $6,000 apiece.

Quotes then …

  • A1-C1 GTX1080Ti consumer grade gpus, C2-D2 Tesla P100 enterprise grade gpus
  • A1 most nodes but not max cpu cores or gpus
  • A3 max cpu cores, max gpus, n78 “like”
  • D1 max teraflops, cheapest teraflops, not n78 “like”
Vendor A B C D Notes
Quote #1 #2 #3 #1 #2 #1 #2 #1 #2
Nodes 11 9 8 9 7 8 5 6 6 5-22 U
Cpus 22 18 16 18 14 16 10 12 12
Cores 220 180 224 216 168 192 120 168 144 physical
Gpus 22 36 32 18 28 32 10 12 12
Cores 79 129 115 66 100 115 36 43 43 k physical
Teraflops 7.7+7.8 6.3+13 18+11 18+6.4 14+10 16+11 10+47 12+56 12+56 cpu+gpu (dpfp)
$/TFlop 6,335 5,073 3,510 4,299 3,946 3,822 1,795 1,422 1,638 hpc 38+25
Per Node
Chassis 2U 2U 2U 2U 2U 1U 1U 1U 1U depth of rails?
CPU 2 2 2 2 2 2 2 2 2 “skylake”
4114 4114 5120 6126 6126 6126 6126 5120 6126 model
10+10 10+10 14+14 12+12 12+12 12+12 12+12 14+14 12+12 phys+log/ical
2.2 2.2 2.2 2.6 2.6 2.6 2.6 2.2 2.6 Ghz max 3-3.7
85 85 105 125 125 125 125 105 125 Watts
13.75 13.75 19.25 19.25 19.25 19.25 19.25 19.25 19.25 L3 cache MB
16 16 32 32 32 32 32 32 32 dflops/cycle
DDR4 192 192 192 192 192 192 192 192 192 GB memory
2666 2666 2666 2666 2666 2666 2666 2666 2666 Mhz
Drives 480 480 480 2×240 2×240 1024 1024 480 480 GB storage
2.5“s 2.5”s 2.5“s 3.5”s 3.5“s 2.5”h 2.5“h 2.5”s 2.5“s SSD/HDD
960 960 GB scratch
GPU 2 4 4 2 4 4 2 2 2
GTX GTX GTX GTX GTX GTX Tesla Tesla Telsa warranty note
1080 1080 1080 1080 1080 1080 P100 P100 P100 model
11 11 11 11 11 8! 12 12 16! GB memory
1.6 1.6 1.6 1.6 1.6 1.6 1.9 1.9 1.9 Ghz, max 1.9
250 250 250 250 250 250 250 250 250 watts
Image 1 1 1 ? ? ? ? 1 1 MPI flavors
CentOS7 y y y ? ? y y y y + all software
Nics 2 2 2 2 2 2 2 2 2 gigabit ethernet
Warranty 3 3 3 3 3 3 3 3 3 excl gtx cards
n78? n y y n y y n n n matches?
-1.8 -3.1 +1.8 +4.9 -5.8 +4.7 +2.3 -3.3 +11.4 Δ
Remember to add self spare gtx cards! No spares

GFLOPS = #chassis * #nodes/chassis * #sockets/node * #cores/socket * GHz/core * FLOPs/cycle

Note that the use of a GHz processor yields GFLOPS of theoretical performance. Divide GFLOPS by 1000 to get TeraFLOPS or TFLOPS.

http://en.community.dell.com/techcenter/high-performance-computing/w/wiki/2329

Find dpflops/cycle here https://www.aspsys.com/solutions/hpc-processors/intel-xeon-skylake/

Request

Early fall 2017 we purchased a 1U server from Exxact with four GTX1080Ti gpus and 128 gb memory with dual 8 core E5-2620 v4 cpus. These gpus have been performing well for us with speed ups (compared to our K20s); amber 5x, gromacs 2x, lammps 11x, and fsl's bedpostx a whopping 118x. Our “sweet spot cpu:gpu” ratios for the type of jobs we are running are;

  • amber 1:1,
  • gromacs 10:1,
  • lammps 2-4:1, and
  • namd 13:1.

Most of our jobs will only use one gpu at a time. We also have a dying cpu only jobs queue of HP blade servers. This expansion would run gpu jobs but also cpu only jobs. Hence we'd like to have more servers, which would allow a good mix of cpu only and cpu/gpu jobs.

  • Budget? $?k. When? near end Q3. Each server containing…
  • two (amber certified) gpus, two cpus (at least 10 core)
  • 128 gb memory
  • centos7 with modules
  • latest nvidia and mpi (open as to what flavor, mpich for amber I think, openmpi for all others)
  • latest amber (will provide proof of purchase), gromacs, lammps, namd
  • at least two gigabit ethernet ports starting at
  • node n79 nic1 192.168.102.89 nic2 10.10.102.89 ipmi 192.168.103.89 netmask for all 255.255.0.0
  • image/configure at least one server (or do all, I can image using warewulf golden image)
  • leave building environment and logs (I will upgrade our K20s following this setup)
  • install software in /usr/local
  • 3 year warranty, NBD

We will supply:

  • standard U42 rack (rails at 30”, up to 37“ usable) with 7k BTU AC (an experiment)
  • 2x vertical PDUs (24A) supplying 2×30 C13 outlets, 208V
  • openlava scheduler rpms
  • two ethernet switches

Open to suggestions, modifications, substitutions. We'd prefer to go with the GTX1080Ti gpus which are still listed as certified for Amber 18.


Back

cluster/168.1536082020.txt.gz · Last modified: 2018/09/04 13:27 by 127.0.0.1