DokuWiki

Notes

HP cluster off support 11/30/2013
We need greentail/disk array support maybe 2 more years?
- Karen added to budget, Dave to approve ($2200/year)
We need another disk array
- For robust D2D backup
Pressed HP Procurve ethernet backup switch into production
- Dell Force 10 switch failing or traffic overwhelmed it
- Need a file server away from the login node
We need a new cluster with support
- power consumption versus computational power
- gpu versus cpu
- 6 of 36 dell compute nodes have failed

GPU Specs

Round 3

Specs: MW - GPU

This is what we ended up buying May 2013.

Topic	Description
General	10 CPUs (80 cores), 20 GPUs (45,000 cuda cores), 256 gb ram/node (1,280 gb total), plus head node (128 gb)
Head Node	1x42U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)
	16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8
	2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller
	Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)
	1400w Power Supply 1+1 redundant
Nodes	5x 2U Rackmountable Chassis, 5x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series
	5x 16x16GB 240-Pin DDR3 1600 MHz (256gb/node memory, max 256gb)
	5x 1x120GB SSD 7200RPM, 5x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio
	?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 5x 4 PCIE 3.0 x16 Slots, 5x 8 PCIE 3,0 x8 Slots
	5xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)
	5x1620W 1+1 Redundant Power Supplies
Network	1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch
	1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)
Rack	1x42U rack with power distributions (14U used)
Power	2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)
Software	CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA
	scheduler and gnu compilers installed and configured
	Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5
Warranty	3 Year Parts and Labor (lifetime technical support)
GPU Teraflops	23.40 double, 70.40 single
Quote	<html><!– estimated at $124,845 –></html>Arrived, includes S&H and Insurance
Includes	Cluster pre-installation service

16U - estimated draw 6,900 Watts and 23,713 BTUs cooling - $30K/year
5 GPU shelves
2 PDUs
42 TB raw
FDR interconnects
120GB SSD drives on nodes
256 gb ram on nodes, 16gb/core
Areca hardware raid
Lifetime technical support

Specs: EC GPU

Topic	Description
General	12 CPUs (96 cores), 20 GPUs (45,000 cuda cores), 128 gb ram/node (640 gb total), plus head node (128gb)
Head Node	1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
	8x16GB 240-Pin DDR3 1600 MHz ECC (128gb, max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full
	2x2TB RAID1 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	1920w Power Supply, redundant
Nodes	6x2U Rackmountable Chassis, 6×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series
	48x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 8gb/core, max 256gb)
	6x1TB 7200RPM, 5x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio
	2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
	6xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	6x1800W Redundant Power Supplies
Network	1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) +9×7' cables (2 uplink cables)
	1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Rack & Power	42U, 4xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software	CentOS, Bright Cluster Management (1 year support)
	Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)
Storage	3U 52TB Disk Array (28x2TB) Raid 6, cascade cable
Warranty	3 Year Parts and Labor (EC technical support?)
GPU Teraflops	23.40 double, 70.40 single
Quote	<html><!– $124,372 incl $800 S&H –></html>Arrived

20U - estimated draw 7,400 Watts - $30K/year for cooling and power
5 GPU shelves
1 CPU shelf
4 PDU - this could be a problem!
56TB raw
QDR interconnects
1 TB disk on node, makes for a large /localscratch
LSI hardware raid card

Round 2

Specs: MW - GPU

Topic	Description
General	8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node
Head Node	1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)
	16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8
	2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller
	Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)
	1400w Power Supply 1+1 redundant
Nodes	4x 2U Rackmountable Chassis, 4x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series
	4x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb)
	4x 1x120GB SSD 7200RPM, 4x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio
	?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots
	4xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)
	4x1620W 1+1 Redundant Power Supplies
Network	1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch
	1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)
Rack	1x42U rack with power distributions (14U used)
Power	2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)
Software	CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA
	scheduler and gnu compilers installed and configured
	Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5
Warranty	3 Year Parts and Labor (lifetime technical support)
GPU Teraflops	18.72 double, 56.32 single
Quote	<html><!– estimated at $106,605 –></html>Arrived, includes S&H and Insurance
Includes	Cluster pre-installation service

5,900 Watts and 20,131 BTUs/Hour
smaller infiniband switch (8 port) and ethernet switch (24 port)
- the 18 port switch has been included, swap out for $2K spare parts
sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
120GB SSD drives on nodes
storage: 42TB usable Raid 6
Lifetime technical support

Spare parts
- ?
Expand Storage
- upgrade to 56TB usable Raid 6 ($5.3K using 16x4TB disks)
- upgrade to 90TB usable Raid 60 ($10.3K using 34x3TB disks)

Alternate storage:
- add storage server of 2.4 TB Usable 15K fast speed SAS disk ($9K-1K of 4U chassis)
- leave 18TB local storage on head node

Specs: MW - CPU

Topic	Description
General	13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb)
Head Node	1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)
	16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8
	2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller
	Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)
	1400w Power Supply 1+1 redundant
Nodes	13x 2U Rackmountable Chassis, 13x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series
	13x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb)
	13x 1x120GB SSD 7200RPM
	?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots
	13xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)
	13x600W non Redundant Power Supplies
Network	1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch
	1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)
Rack	1x42U rack with power distributions (14U used)
Power	2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)
Software	CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA
	scheduler and gnu compilers installed and configured
	Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5
Warranty	3 Year Parts and Labor (lifetime technical support)
Quote	<html><!– estimated at $104,035 –></html>Arrived, includes S&H and Insurance
Includes	Cluster pre-installation service

5,250 Watts and 17,913 BTUs/Hour
infiniband switch (18 port needed for IPoIB) and ethernet switch (24 port)
sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
120GB SSD drives on nodes
storage: 42TB usable Raid 6
Lifetime technical support
Drop software install ($3.5K savings)

Spare parts
- ?
Expand Storage
- upgrade to 56TB usable Raid 6 ($5.3K using 16x4TB disks)
- upgrade to 90TB usable Raid 60 ($10.3K using 34x3TB disks)

Alternate storage:
- add storage server of 2.4 TB Usable 15K fast speed SAS disk ($9K-1K of 4U chassis)
- leave 18TB local storage on head node

Specs: EC GPU

Topic	Description
General	8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node (256gb)
Head Node	1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
	16x16GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full
	2x2TB RAID1 7200RPM, 8x2TB RAID6 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	1920w Power Supply, redundant
Nodes	4x2U Rackmountable Chassis, 4×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series
	32x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 32gb/gpu, max 256gb)
	4x1TB 7200RPM, 4x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio
	2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
	4xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	4x1800W Redundant Power Supplies
Network	1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7×7' cables (2 uplink cables)
	1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Rack & Power	42U, 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software	CentOS, Bright Cluster Management (1 year support)
	Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)
Warranty	3 Year Parts and Labor (EC technical support?)
GPU Teraflops	18.72 double, 56.32 single
Quote	<html><!– $103,150 incl $800 S&H –></html>Arrived

16TB Raid6 storage (14 TB usable - tight for /home)
full height rack

Specs: EC CPU

Topic	Description
General	13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb)
Head Node	1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
	16x16GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full
	2x2TB RAID1 7200RPM, 8x2TB RAID6 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	1920w Power Supply, redundant
Nodes	13x1U Rackmountable Chassis, 13×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series
	104x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max ???gb)
	13x1TB 7200RPM
	2×10/100/1000 NIC, Dedicated IPMI Port, 1x PCIE 3.0 x16 Slots
	13xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	13x480W non Redundant Power Supplies
Network	1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 7×7' cables (2 uplink cables)
	1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Rack & Power	42U, 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software	CentOS, Bright Cluster Management (1 year support)
	Amber12 (cluster install), Lammps (shared filesystem), NAMD
Warranty	3 Year Parts and Labor (EC technical support?)
Quote	<html><!– $105,770 incl $800 S&H –></html>Arrived

16TB Raid6 storage (14 TB usable - tight for /home)
1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)

Round 1

ConfCall & Specs: AC

09nov12:

/home and /apps mounted on CPU side. How does GPU access these? Or is job on CPU responsible for this?
Single versus double precision? Both needed I assume.
Unit above is Nvidia “Fermi” series, being phased out. “Kepler” K10 and K20 series coming out. Get an earlybird unit, Jim will find out.
Lava compatibility (almost certain but need to check) AC uses SGE.
We do not really “know” if our current jobs would experience a boost in speed (hence one unit first - but there is a software problem here)
Intel Xeon Phi Co-Processors: Intel compilers will work on this platform (which is huge!) and no programming learning curve. (HP Proliant servers with 50+ cores), Jim will find out.
~~Vendor states scheduler sees GPUs directly (but how does it then get access to home dirs, check this out)~~ … update: this is not true, CPU job offloads to GPU

AC Specs

Early 2013 product line up
- http://www.advancedclustering.com/products/pinnacle-flex-system.html Kansas City, KS
Quote coming for single 4U unit, which could be a one off test unit (compare to HP)

Topic	Description
General	2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node
Head Node	None
Nodes	1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series
	8x4GB 240-Pin DDR3 1600 MHz memory (32gb/node), 11gb/gpu, max 256gb)
	1x120GB SATA 2.5“ Solid State Drive (OS drive), 7x3TB 7200RPM
	3xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio
	2×10/100/1000 NIC, 3x PCIE 3.0 x16 Slots
	1xConnectX-3 VPI adapter card, single-port 56Gb/s
	2x1620W Redundant Power Supplies
Network	1×36 port Infiniband FDR (56Gb/s) switch & 4xConnectX-3 single port FDR (56Gb/s) IB adapter + 2x 2 meter cables (should be 4)
Power	Rack power ready
Software	None
Warranty	3 Year Parts and Labor (AC technical support)
GPU Teraflops	3.51 double, 10.56 single
Quote	<html><!– $33,067.43 S&H included –></html>Arrived

In order to match the “benchmark option” we need 5 units
- 8100 Watts, would still fit power wise but not rack wise (we'd need 20U)

Single rack, 21 TB of disk space (Raid 5/6)
The IB switch (plus 4 spare cards/cables) is roughly 1/3rd of the price
- If we remove it, we need QDR Voltaire compliant HCAs and cables (3 ports free)
The config does not pack as much teraflops for the dollars; we'll see

ConfCall & Specs: EC

12nov12:

GPU hardware only
scheduler never sees gpus just cpus
cpu to gpu is one-to-one when using westmere chips
bright cluster management (image based) - we can front end with lava
what's the memory connection cpu/gpu???
home dirs - cascade via voltaire 4036, need to make sure this is compatible!
software on local disk? home dirs via infiniband ipoib, yes, but self install
amber (charge for this) and lammps preinstalled - must be no problem, will be confirmed
2 K20 per 2 CPUs per rack 900-1000W, 1200 W power supply on each node
PDU on simcluster, each node has power connection
quote coming for 4 node simcluster
testing periods can be staged so you are testing exactly what we're buying if simcluster if within budget (see K20 above)

EC Specs

http://exxactcorp.com/index.php/solution/solu_detail/119 Fremont, CA

Topic	Description
General	8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 64 gb ram/node, plus head node
Head Node	1x1U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
	8x8GB 240-Pin DDR3 1600 MHz ECC (max 256gb), 2×10/100/1000 NIC, 2x PCIe x16 Full
	2x2TB 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	600w Power Supply
Nodes	4x2U Rackmountable Chassis, 8xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series
	32x8GB 240-Pin DDR3 1600 MHz (64gb/node memory, 16gb/gpu, max 256gb)
	4x1TB 7200RPM, 16xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio
	2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
	4xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
	4x1800W Redundant Power Supplies
Network	1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7' cables
	1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Power	2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software	CentOS, Bright Cluster Management (1 year support)
	Amber12 (cluster install), Lammps (shared filesystem), (Barracuda for weirlab?)
Warranty	3 Year Parts and Labor (EC technical support?)
GPU Teraflops	18.72 double, 56.32 single
Quote	<html><!– $93,600 + S&H –></html>Arrived

Lets make this the “benchmark option” based on double precision
In order to match this with Xeon Phis we'd need 18 of them (probably 5 4U trays)

This is the (newest) simcluster design (that can be tested starting Jan 2013)
- 24U cabinet
We could deprecate 50% of bss24 queue freeing two L6-30 connectors
Spare parts:
- Add another HCA card to greentail and connect to Mellanox switch (long cable)
  - also isolates GPU traffic from other clusters
- 1 8-port switch, 4 HCA cards, 4 long cables (for petal/swallow tails plus spare)
New head node
- First let EC install Bright/Openlava (64 CPU cores implies 64 job slots)
  - 16 GPUs implies 16×2,500 or 40,000 cuda cores (625 per job slot on average)
- Use as standalone cluster or move GPU queue to greentail
- If so, turn this head node into a 16 job slot ram heavy compute node?
  - 256-512gb (Order?)
  - add local storage? (up to 10 1or2 TB disks)
Compute nodes
- add local storage? (up to 10 1or2 TB disks)
Bright supports openlava and GPU monitoring (get installed)
- http://www.brightcomputing.com/Linux-Cluster-Workload-Management.php
- http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php
EC software install
- sander, sander.MPI, pmemd, pmemd.cuda (single GPU version), pmemd.cuda.MPI (the multi-GPU version)
- NVIDIA Toolkit v4.2. Please note that v5.0 is NOT currently supported
- MVAPICH2 V1.8 or later / MPICH2 v1.4p1 or later recommended, OpenMPI is NOT recommended.)
- make sure they do not clean source, analyze how they compiled
- which compiler will they use? which MPI ~~(prefer OpenMPI, have wrapper script for that)~~

ConfCall & Specs: HP

HP 19nov12: meeting notes

HP ProLiant SL270s Generation 8 (Gen8); 4U half width with 2 CPUs + 8 (max) GPUs
- The s6500 Chassis is 4U tray holding two S270s servers
max 8 GPUs (20,000 cuda cores) + 2 CPUs (total 16 cores), dual drives, 256gb max
- K20 availability will be confirmed by Charlie
power
- Charlie will crunch numbers of existing HPC and assess if we can use the current rack
- otherwise a stand alone half rack solution
~~one IB cable to Voltaire per chassis?~~ get new FDR infiniband switch, period.
- connect greentail with additional HCA card, or voltaire to voltaire?
our software compilation problem, huge
- but they have great connections with Nvidia for compilation help (how to qualify that?)
CMU for GPU monitoring, 3-rendering of what GPU is doing
This SL270s can also support up to 8 Xeon Phi coprocessors
- but expect very lengthy delays, Intel is not ready for delivery (1 Phi = 1 double teraflop)

HP Specs

http://h18004.www1.hp.com/products/quickspecs/14405_div/14405_div.HTML

~~First unit, single tray in chassis~~
This hardware can be tested at ExxactCorp so single tray purchase for testing not a requirement

2 chassis in 8U + 4 SL250s + each with 8 GPUs would be a massive GPU cruncher
- 8 CPUs, 32 GPUs = 64 cpu cores and 80,000 cuda cores (avg 1,250cuda/core)
- peak performance: 37.44 double, 112.64 single precision (twice the “benchmark option”)
1 chassis in 4U + 2 Sl250s + each with * GPUs would the “benchmark option”

Topic	Description
General	6 CPUs (total 48 cores), 18 GPUs (45,000 cuda cores), 64 gb ram/node, no head node
Head Node	None
Chassis	2xs6500 Chassis (4U) can each hold 2 half-width SL250s(gen8, 4U) servers, rackmounted, 4x1200W power supplies, 1x4U rack blank
Nodes	3xSL250s(gen8), 3x2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores (total 16 cores/node)), Romley series
	3x16x8GB 240-Pin DDR3 1600 MHz (64gb/node, 10+ gb/gpu, max 256gb)
	3x2x500GB 7200RPM, 3x6xNVIDIA Tesla K20 5 GB GPUs (5 gpu/node), 1CPU-to-3GPU ratio
	3x2x10/100/1000 NIC, Dedicated IPMI Port, 3x8x PCIE 3.0 x16 Slots (GPU), 3x2x PCIE 3.0 x8
	3x2xIB interconnect, QDR 40Gb/s, FlexibleLOM goes into PCI3x8 slot
	chassis supplied power; 3x1x one PDU power cord (416151-B21)? - see below
Network	1xVoltaire QDR 36-port infiniband 40 Gb/s switch, + 6x 5M QSFP IB cables
	No ethernet switch, 17x 7' CAT5 RJ45 cables
Power	rack PDU ready, what is 1x HP 40A HV Core Only Corded PDU???
Software	RHEL, CMU GPU enabled (1 year support) - not on quote???
Warranty	3 Year Parts and Labor (HP technical support?)
GPU Teraflops	21.06 double, 63.36 single
Quote	<html><!– $128,370, for a 1×6500+2xSl250 setup estimate is $95,170 –></html>Arrived (S&H and insurance?)

To compare with “benchmark option” price wise; 37% higher (25% less CPU cores)
To compare with “benchmark option” performance; 12.5% higher (double precision peak)

When quote is reduced to 1x s6500 chassis and 2x SL250s:
- To compare with “benchmark option” price wise; 1.6% higher (50% less CPU cores)
- To compare with “benchmark option” performance; 25% lower (double precision peak)

HP on site install
we have 9U in HP rack available (1U for new switch)
- L6-30 7,500 Watts x3 PDUs (non-UPS) = 22,500 Watts - HP cluster 10,600 Watts
- leaves 11,898 Watts, should be sufficient for 4 SL270s(redundant power supplies)
new infiniband switch, isolates GPU cluster traffic from rest of HPC
- 36 port IB switch overkill
- still need IB connection greentail to new switch (home dirs IPoIB)
1 TB local storage per node
our software install problem, so is the 12.5% worth it? (with 3 trays)

ConfCall & Specs: AX

Cluster management is ROCKS (we'll pass)
No scheduler (that's OK, we'll use OpenLava)
They do not install software, only operating system and
CUDA driver setup and installation

AX Specs

http://www.amax.com/hpc/productdetail.asp?product_id=simcluster Fremont, CA

Topic	Description
General	8 CPUs (48 cores), 12 GPUs (30,000 cuda cores), 64 gb ram/node, plus head node
Head Node	1x1U Rackmount System, 2x Intel Xeon E5-2620 2.0GHz (12 cores total)
	64GB DDR3 1333MHz (max 256gb), 2×10/100/1000 NIC, 2x PCIe x16 Full
	2x1TB (Raid 1) 7200RPM, InfiniBand adapter card, Single-Port, QSFP 40Gb/s
	???w Power Supply, CentOS
Nodes	4x1U, 4x2xIntel Xeon E5-2650 2.0GHz, with 6 cores (12cores/node) Romley series
	4x96GB 240-Pin DDR3 1600 MHz (96gb/node memory, 8gb/gpu, max 256gb)
	4x1TB 7200RPM, 12xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio
	2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
	4xInfiniband adapter card, Single-Port, QSFP 40Gb/s
	4x??00W Redundant Power Supplies
Network	1x Infiniband Switch (18 ports)& HCAs (single port) + ?' cables
	1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (+ ?' cables)
Power	there are 3 rack PDUs? What are the connectors, L6-30?
Software	CUDA only
Warranty	3 Year Parts and Labor (AX technical support?)
GPU Teraflops	14.04 double, 42.96 single
Quote	<html><!– $73,965 (S&H $800 included) –></html>Arrived

22U cabinet
Insurance during shipping is our problem (non-returnable)
To compare with “benchmark option” price wise; 21% lower (25% less CPU cores)
To compare with “benchmark option” performance; 22% lower (double precision peak)
If we go turnkey systems having software installed is huge

ConfCall & Specs: MW

sells both individual racks and turn-key systems
- racks are 4U with 2 CPUs and 8 GPUs, 2200 Watts, K20X GPUs
- turn-key units are per customer specifications
they will install all software components (if license keys are provided)
- includes CUDA drivers and setup, Amber (pmemd.cuda & pmemd.cuda.MPI, check) and Lammps
- but also Matlab and Mathematica if needed (wow!)
standard 2 year warranty though (no biggie)

MW Specs

http://www.microway.com/tesla/clusters.html Plymouth, MA

Topic	Description
General	8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 32 gb ram/node, plus head node
Head Node	1x2U Rackmount System, 2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores
	8x4GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 3x PCIe x16 Full, 3x PCIe x8
	2x1TB 7200RPM (Raid 1) + 6x2TB (Raid 6), Areca Raid Controller
	Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s
	740w Power Supply 1+1 redundant
Nodes	4x1U Rackmountable Chassis, 4×2 Xeon E5-2650 2.0 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series
	4x8x4GB 240-Pin DDR3 1600 MHz (32gb/node memory, 8gb/gpu, max 256gb)
	4x1x120GB SSD 7200RPM, 4x4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio
	2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
	4xConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s
	4x1800W (non) Redundant Power Supplies
Network	1x Mellanox InfiniBand FDR Switch (36 ports)& HCAs (single port) + 3m cable FDR to existing Voltaire switch
	1x 1U 48 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)
Rack
Power	2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)
Software	CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA 5
	scheduler and gnu compilers installed and configured
	Amber12, Lammps, Barracuda (for weirlab?), and others if desired …bought through MW
Warranty	3 Year Parts and Labor (lifetime technical support)
GPU Teraflops	18.72 double, 56.32 single
Quote	<html><!– estimated at $95,800 –></html>Arrived, includes S&H and Insurance
Upgrades	Cluster pre-installation service
	5×2 E5-2660 2.20 Ghz 8 core CPUs
	5x upgrade to 64 GB per node

At full load 5,900 Watts and 20,131 BTUs/hour

2% more expansive than “benchmark option” (as described above with Upgrades), else identical
- But a new rack (advantageous for data center)
- With lifetime technical support
- solid state drives on compute nodes
- 12 TB local storage (8TB usable)

Then

36 port FDR switch replace with 8 port QDR switch for savings (40 vs 56 Gbps)
- and all server adapter cards to QDR (with one hook up to existing Voltaire switch)
Expand memory footprint
- Go to 124 GB memory/noe to beef up the CPU HPC side of things
- 16 cpu cores/nodes minus 4 cpu/gpu cores/node = 12 cpu cores using 104gb which is about 8 GB/cpu core
Online testing available (K20, do this)
- then decide on PGI compiler at purchase time
- maybe all Lapack libraries too
Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)
Leave the 6x2TB disk space (for backup)
- 2U, 8 drives up to 6×4=24 TB, possible?
Add an entry level Infiniband/Lustre solution
- for parallel file locking

Spare parts
- 8 port switch, HCAs and cables, drives …
- or get 5 years total warranty

Testing notes
- Amber, LAMMPS, NAMD
- cuda v4&5
- install/config dirs
- use gnu … with openmpi
- make deviceQuery

Back

Table of Contents

Notes

GPU Specs

Round 3

Specs: MW - GPU

Specs: EC GPU

Round 2

Specs: MW - GPU

Specs: MW - CPU

Specs: EC GPU

Specs: EC CPU

Round 1

ConfCall & Specs: AC

ConfCall & Specs: EC

ConfCall & Specs: HP

ConfCall & Specs: AX

ConfCall & Specs: MW