User Tools

Site Tools


cluster:110

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:110 [2013/03/13 11:17]
hmeij [GPU Specs]
cluster:110 [2013/05/24 09:39] (current)
hmeij [Specs: MW - GPU]
Line 5: Line 5:
 ===== Notes ===== ===== Notes =====
  
-  * HP off support 11/30/2013+  * HP cluster off support 11/30/2013
   * We need greentail/disk array support maybe 2 more years?   * We need greentail/disk array support maybe 2 more years?
     * Karen added to budget, Dave to approve ($2200/year)     * Karen added to budget, Dave to approve ($2200/year)
   * We need another disk array   * We need another disk array
     * For robust D2D backup     * For robust D2D backup
-  * Pressed HP Procurve ehternet switch into production+  * Pressed HP Procurve ethernet backup switch into production
     * Dell Force 10 switch failing or traffic overwhelmed it     * Dell Force 10 switch failing or traffic overwhelmed it
     * Need a file server away from the login node     * Need a file server away from the login node
-  * We need a new cluster with support!+  * We need a new cluster with support 
 +    * power consumption versus computational power 
 +    * gpu versus cpu 
 +    * 6 of 36 dell compute nodes have failed
  
 ===== GPU Specs ===== ===== GPU Specs =====
 +
 +===== Round 3 =====
 +
 +
 +
 +==== Specs: MW - GPU ====
 +
 +This is what we ended up buying May 2013.
 +
 +^  Topic^Description  ^
 +|  General| 10 CPUs (80 cores), 20 GPUs (45,000 cuda cores), 256 gb ram/node (1,280 gb total), plus head node (128 gb)|
 +|  Head Node|1x42U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)|
 +|  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8|
 +|  |2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller|
 +|  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)|
 +|  |1400w Power Supply 1+1 redundant|
 +|  Nodes|5x 2U Rackmountable Chassis, 5x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series|
 +|  |5x 16x16GB 240-Pin DDR3 1600 MHz (256gb/node memory, max 256gb)|
 +|  |5x 1x120GB SSD 7200RPM, 5x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 5x 4 PCIE 3.0 x16 Slots, 5x 8 PCIE 3,0 x8 Slots|
 +|  |5xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)|
 +|  |5x1620W 1+1 Redundant Power Supplies|
 +|  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch|
 +|  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)|
 +|Rack  |1x42U rack with power distributions (14U used)|
 +|  Power|2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)|
 +|  Software| CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA|
 +|  | scheduler and gnu compilers installed and configured|
 +|  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 |
 +|  Warranty|3 Year Parts and Labor (lifetime technical support)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- estimated at $124,845 --></html>Arrived, includes S&H and Insurance|
 +|Includes  |Cluster pre-installation service  |
 +
 +
 +  * 16U - estimated draw 6,900 Watts and 23,713 BTUs cooling - $30K/year
 +  * 5 GPU shelves
 +  * 2 PDUs
 +  * 42 TB raw
 +  * FDR interconnects
 +  * 120GB SSD drives on nodes
 +  * 256 gb ram on nodes, 16gb/core
 +  * Areca hardware raid
 +  * Lifetime technical support
 +
 +==== Specs: EC GPU ====
 +
 +
 +^  Topic^Description  ^
 +|  General| 12 CPUs (96 cores), 20 GPUs (45,000 cuda cores), 128 gb ram/node (640 gb total), plus head node (128gb)|
 +|  Head Node|1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores|
 +|  |8x16GB 240-Pin DDR3 1600 MHz ECC (128gb, max 512gb), 2x10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full|
 +|  |2x2TB RAID1 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |1920w Power Supply, redundant|
 +|  Nodes|6x2U Rackmountable Chassis, 6x2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series|
 +|  |48x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 8gb/core, max 256gb)|
 +|  |6x1TB 7200RPM, 5x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |2x10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots|
 +|  |6xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |6x1800W Redundant Power Supplies|
 +|  Network|1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) +9x7' cables (2 uplink cables)|
 +|  |1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)|
 +|  Rack & Power|42U, 4xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU|
 +|  Software| CentOS, Bright Cluster Management (1 year support)|
 +|  | Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)|
 +|  Storage|3U 52TB Disk Array (28x2TB) Raid 6, cascade cable|
 +|  Warranty|3 Year Parts and Labor (EC technical support?)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- $124,372 incl $800 S&H --></html>Arrived|
 +
 +
 +  * 20U - estimated draw 7,400 Watts - $30K/year for cooling and power
 +  * 5 GPU shelves
 +  * 1 CPU shelf
 +  * 4 PDU - this could be a problem!
 +  * 56TB raw 
 +  * QDR interconnects
 +  * 1 TB disk on node, makes for a large /localscratch
 +  * LSI hardware raid card
 +
  
 ===== Round 2 ===== ===== Round 2 =====
cluster/110.1363187829.txt.gz · Last modified: 2013/03/13 11:17 by hmeij