DokuWiki

2018 GPU Expansion

Important notes … about GeForce GTX1080Ti

☎ From Nvidia web site: Warranted Product is intended for consumer end user purposes only, and is not intended for datacenter use and/or GPU cluster commercial deployments (“Enterprise Use”). Any use of Warranted Product for Enterprise Use shall void this warranty.

☎ From Exxact web site: Exxact AMBER Certified AMBER MD Workstations and Clusters, in addition to being numerically validated using custom GPU validation suites, also come with an optimized version of AMBER 16 that has been developed in a collaboration between lead AMBER developer Ross Walker (SDSC), NVIDIA and Exxact.

So it is a grey area if GTX1080Ti data center research use is warranted or not. Most quotes come with language that all warranty issues will be the handled between card owner and card issuer. If we go with consumer grade GTX we should add some self spares…dev budget? $700 apiece. Tesla P100 is enterprise level, $6,000 apiece.

Quotes then …

A1-C1 GTX1080Ti consumer grade gpus, C2-D2 Tesla P100 enterprise grade gpus
A1 most nodes but not max cpu cores or gpus
A3 max cpu cores, max gpus, n78 “like”
D1 max teraflops, cheapest teraflops, not n78 “like”

Vendor	A			B		C		D		Notes
Quote	#1	#2	#3	#1	#2	#1	#2	#1	#2
Nodes	11	9	8	9	7	8	5	6	6	5-22 U
Cpus	22	18	16	18	14	16	10	12	12
Cores	220	180	224	216	168	192	120	168	144	physical
Gpus	22	36	32	18	28	32	10	12	12
Cores	79	129	115	66	100	115	36	43	43	k physical
Teraflops	7.7+7.8	6.3+13	18+11	18+6.4	14+10	16+11	10+47	12+56	12+56	cpu+gpu (dpfp)
$/TFlop	6,335	5,073	3,510	4,299	3,946	3,822	1,795	1,422	1,638	hpc 38+25
Per Node
Chassis	2U	2U	2U	2U	2U	1U	1U	1U	1U	depth of rails?
CPU	2	2	2	2	2	2	2	2	2	“skylake”
	4114	4114	5120	6126	6126	6126	6126	5120	6126	model
	10+10	10+10	14+14	12+12	12+12	12+12	12+12	14+14	12+12	phys+log/ical
	2.2	2.2	2.2	2.6	2.6	2.6	2.6	2.2	2.6	Ghz max 3-3.7
	85	85	105	125	125	125	125	105	125	Watts
	13.75	13.75	19.25	19.25	19.25	19.25	19.25	19.25	19.25	L3 cache MB
	16	16	32	32	32	32	32	32	32	dflops/cycle
DDR4	192	192	192	192	192	192	192	192	192	GB memory
	2666	2666	2666	2666	2666	2666	2666	2666	2666	Mhz
Drives	480	480	480	2×240	2×240	1024	1024	480	480	GB storage
	2.5“s	2.5”s	2.5“s	3.5”s	3.5“s	2.5”h	2.5“h	2.5”s	2.5“s	SSD/HDD
				960	960					GB scratch
GPU	2	4	4	2	4	4	2	2	2
	GTX	GTX	GTX	GTX	GTX	GTX	Tesla	Tesla	Telsa	warranty note
	1080	1080	1080	1080	1080	1080	P100	P100	P100	model
	11	11	11	11	11	8!	12	12	16!	GB memory
	1.6	1.6	1.6	1.6	1.6	1.6	1.9	1.9	1.9	Ghz, max 1.9
	250	250	250	250	250	250	250	250	250	watts
Image	1	1	1	?	?	?	?	1	1	MPI flavors
CentOS7	y	y	y	?	?	y	y	y	y	+ all software
Nics	2	2	2	2	2	2	2	2	2	gigabit ethernet
Warranty	3	3	3	3	3	3	3	3	3	excl gtx cards
n78?	n	y	y	n	y	y	n	n	n	matches?
	-1.8	-3.1	+1.8	+4.9	-5.8	+4.7	+2.3	-3.3	+11.4	Δ
Remember to add self spare gtx cards!							No spares

GFLOPS = #chassis * #nodes/chassis * #sockets/node * #cores/socket * GHz/core * FLOPs/cycle

Note that the use of a GHz processor yields GFLOPS of theoretical performance. Divide GFLOPS by 1000 to get TeraFLOPS or TFLOPS.

http://en.community.dell.com/techcenter/high-performance-computing/w/wiki/2329

Find dpflops/cycle here https://www.aspsys.com/solutions/hpc-processors/intel-xeon-skylake/

Request

Early fall 2017 we purchased a 1U server from Exxact with four GTX1080Ti gpus and 128 gb memory with dual 8 core E5-2620 v4 cpus. These gpus have been performing well for us with speed ups (compared to our K20s); amber 5x, gromacs 2x, lammps 11x, and fsl's bedpostx a whopping 118x. Our “sweet spot cpu:gpu” ratios for the type of jobs we are running are;

amber 1:1,
gromacs 10:1,
lammps 2-4:1, and
namd 13:1.

Most of our jobs will only use one gpu at a time. We also have a dying cpu only jobs queue of HP blade servers. This expansion would run gpu jobs but also cpu only jobs. Hence we'd like to have more servers, which would allow a good mix of cpu only and cpu/gpu jobs.

Budget? $?k. When? near end Q3. Each server containing…
two (amber certified) gpus, two cpus (at least 10 core)
128 gb memory
centos7 with modules
latest nvidia and mpi (open as to what flavor, mpich for amber I think, openmpi for all others)
latest amber (will provide proof of purchase), gromacs, lammps, namd
at least two gigabit ethernet ports starting at
node n79 nic1 192.168.102.89 nic2 10.10.102.89 ipmi 192.168.103.89 netmask for all 255.255.0.0
image/configure at least one server (or do all, I can image using warewulf golden image)
leave building environment and logs (I will upgrade our K20s following this setup)
install software in /usr/local
3 year warranty, NBD

We will supply:

standard U42 rack (rails at 30”, up to 37“ usable) with 7k BTU AC (an experiment)
2x vertical PDUs (24A) supplying 2×30 C13 outlets, 208V
openlava scheduler rpms
two ethernet switches

Open to suggestions, modifications, substitutions. We'd prefer to go with the GTX1080Ti gpus which are still listed as certified for Amber 18.

Back

DokuWiki

User Tools

Site Tools

2018 GPU Expansion

Request

Page Tools