User Tools

Site Tools


cluster:110

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:110 [2013/02/25 15:26]
hmeij [Specs: EC CPU]
cluster:110 [2013/05/24 09:39] (current)
hmeij [Specs: MW - GPU]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
 +
 +===== Notes =====
 +
 +  * HP cluster off support 11/30/2013
 +  * We need greentail/disk array support maybe 2 more years?
 +    * Karen added to budget, Dave to approve ($2200/year)
 +  * We need another disk array
 +    * For robust D2D backup
 +  * Pressed HP Procurve ethernet backup switch into production
 +    * Dell Force 10 switch failing or traffic overwhelmed it
 +    * Need a file server away from the login node
 +  * We need a new cluster with support
 +    * power consumption versus computational power
 +    * gpu versus cpu
 +    * 6 of 36 dell compute nodes have failed
  
 ===== GPU Specs ===== ===== GPU Specs =====
 +
 +===== Round 3 =====
 +
 +
 +
 +==== Specs: MW - GPU ====
 +
 +This is what we ended up buying May 2013.
 +
 +^  Topic^Description  ^
 +|  General| 10 CPUs (80 cores), 20 GPUs (45,000 cuda cores), 256 gb ram/node (1,280 gb total), plus head node (128 gb)|
 +|  Head Node|1x42U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)|
 +|  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8|
 +|  |2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller|
 +|  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)|
 +|  |1400w Power Supply 1+1 redundant|
 +|  Nodes|5x 2U Rackmountable Chassis, 5x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series|
 +|  |5x 16x16GB 240-Pin DDR3 1600 MHz (256gb/node memory, max 256gb)|
 +|  |5x 1x120GB SSD 7200RPM, 5x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 5x 4 PCIE 3.0 x16 Slots, 5x 8 PCIE 3,0 x8 Slots|
 +|  |5xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)|
 +|  |5x1620W 1+1 Redundant Power Supplies|
 +|  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch|
 +|  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)|
 +|Rack  |1x42U rack with power distributions (14U used)|
 +|  Power|2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)|
 +|  Software| CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA|
 +|  | scheduler and gnu compilers installed and configured|
 +|  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 |
 +|  Warranty|3 Year Parts and Labor (lifetime technical support)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- estimated at $124,845 --></html>Arrived, includes S&H and Insurance|
 +|Includes  |Cluster pre-installation service  |
 +
 +
 +  * 16U - estimated draw 6,900 Watts and 23,713 BTUs cooling - $30K/year
 +  * 5 GPU shelves
 +  * 2 PDUs
 +  * 42 TB raw
 +  * FDR interconnects
 +  * 120GB SSD drives on nodes
 +  * 256 gb ram on nodes, 16gb/core
 +  * Areca hardware raid
 +  * Lifetime technical support
 +
 +==== Specs: EC GPU ====
 +
 +
 +^  Topic^Description  ^
 +|  General| 12 CPUs (96 cores), 20 GPUs (45,000 cuda cores), 128 gb ram/node (640 gb total), plus head node (128gb)|
 +|  Head Node|1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores|
 +|  |8x16GB 240-Pin DDR3 1600 MHz ECC (128gb, max 512gb), 2x10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full|
 +|  |2x2TB RAID1 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |1920w Power Supply, redundant|
 +|  Nodes|6x2U Rackmountable Chassis, 6x2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series|
 +|  |48x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 8gb/core, max 256gb)|
 +|  |6x1TB 7200RPM, 5x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |2x10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots|
 +|  |6xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |6x1800W Redundant Power Supplies|
 +|  Network|1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) +9x7' cables (2 uplink cables)|
 +|  |1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)|
 +|  Rack & Power|42U, 4xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU|
 +|  Software| CentOS, Bright Cluster Management (1 year support)|
 +|  | Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)|
 +|  Storage|3U 52TB Disk Array (28x2TB) Raid 6, cascade cable|
 +|  Warranty|3 Year Parts and Labor (EC technical support?)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- $124,372 incl $800 S&H --></html>Arrived|
 +
 +
 +  * 20U - estimated draw 7,400 Watts - $30K/year for cooling and power
 +  * 5 GPU shelves
 +  * 1 CPU shelf
 +  * 4 PDU - this could be a problem!
 +  * 56TB raw 
 +  * QDR interconnects
 +  * 1 TB disk on node, makes for a large /localscratch
 +  * LSI hardware raid card
 +
  
 ===== Round 2 ===== ===== Round 2 =====
cluster/110.1361823989.txt.gz ยท Last modified: 2013/02/25 15:26 by hmeij