User Tools

Site Tools


cluster:110

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:110 [2013/02/19 14:27]
hmeij [Specs: MW - CPU]
cluster:110 [2013/05/24 13:39] (current)
hmeij [Specs: MW - GPU]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
 +
 +===== Notes =====
 +
 +  * HP cluster off support 11/30/2013
 +  * We need greentail/disk array support maybe 2 more years?
 +    * Karen added to budget, Dave to approve ($2200/year)
 +  * We need another disk array
 +    * For robust D2D backup
 +  * Pressed HP Procurve ethernet backup switch into production
 +    * Dell Force 10 switch failing or traffic overwhelmed it
 +    * Need a file server away from the login node
 +  * We need a new cluster with support
 +    * power consumption versus computational power
 +    * gpu versus cpu
 +    * 6 of 36 dell compute nodes have failed
  
 ===== GPU Specs ===== ===== GPU Specs =====
 +
 +===== Round 3 =====
 +
 +
 +
 +==== Specs: MW - GPU ====
 +
 +This is what we ended up buying May 2013.
 +
 +^  Topic^Description  ^
 +|  General| 10 CPUs (80 cores), 20 GPUs (45,000 cuda cores), 256 gb ram/node (1,280 gb total), plus head node (128 gb)|
 +|  Head Node|1x42U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)|
 +|  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8|
 +|  |2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller|
 +|  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)|
 +|  |1400w Power Supply 1+1 redundant|
 +|  Nodes|5x 2U Rackmountable Chassis, 5x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series|
 +|  |5x 16x16GB 240-Pin DDR3 1600 MHz (256gb/node memory, max 256gb)|
 +|  |5x 1x120GB SSD 7200RPM, 5x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 5x 4 PCIE 3.0 x16 Slots, 5x 8 PCIE 3,0 x8 Slots|
 +|  |5xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)|
 +|  |5x1620W 1+1 Redundant Power Supplies|
 +|  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch|
 +|  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)|
 +|Rack  |1x42U rack with power distributions (14U used)|
 +|  Power|2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)|
 +|  Software| CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA|
 +|  | scheduler and gnu compilers installed and configured|
 +|  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 |
 +|  Warranty|3 Year Parts and Labor (lifetime technical support)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- estimated at $124,845 --></html>Arrived, includes S&H and Insurance|
 +|Includes  |Cluster pre-installation service  |
 +
 +
 +  * 16U - estimated draw 6,900 Watts and 23,713 BTUs cooling - $30K/year
 +  * 5 GPU shelves
 +  * 2 PDUs
 +  * 42 TB raw
 +  * FDR interconnects
 +  * 120GB SSD drives on nodes
 +  * 256 gb ram on nodes, 16gb/core
 +  * Areca hardware raid
 +  * Lifetime technical support
 +
 +==== Specs: EC GPU ====
 +
 +
 +^  Topic^Description  ^
 +|  General| 12 CPUs (96 cores), 20 GPUs (45,000 cuda cores), 128 gb ram/node (640 gb total), plus head node (128gb)|
 +|  Head Node|1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores|
 +|  |8x16GB 240-Pin DDR3 1600 MHz ECC (128gb, max 512gb), 2x10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full|
 +|  |2x2TB RAID1 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |1920w Power Supply, redundant|
 +|  Nodes|6x2U Rackmountable Chassis, 6x2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series|
 +|  |48x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 8gb/core, max 256gb)|
 +|  |6x1TB 7200RPM, 5x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |2x10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots|
 +|  |6xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |6x1800W Redundant Power Supplies|
 +|  Network|1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) +9x7' cables (2 uplink cables)|
 +|  |1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)|
 +|  Rack & Power|42U, 4xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU|
 +|  Software| CentOS, Bright Cluster Management (1 year support)|
 +|  | Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)|
 +|  Storage|3U 52TB Disk Array (28x2TB) Raid 6, cascade cable|
 +|  Warranty|3 Year Parts and Labor (EC technical support?)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- $124,372 incl $800 S&H --></html>Arrived|
 +
 +
 +  * 20U - estimated draw 7,400 Watts - $30K/year for cooling and power
 +  * 5 GPU shelves
 +  * 1 CPU shelf
 +  * 4 PDU - this could be a problem!
 +  * 56TB raw 
 +  * QDR interconnects
 +  * 1 TB disk on node, makes for a large /localscratch
 +  * LSI hardware raid card
 +
  
 ===== Round 2 ===== ===== Round 2 =====
Line 57: Line 152:
  
 ^  Topic^Description  ^ ^  Topic^Description  ^
-|  General| CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node|+|  General|13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb)|
 |  Head Node|1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)| |  Head Node|1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)|
 |  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8| |  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8|
Line 63: Line 158:
 |  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)| |  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)|
 |  |1400w Power Supply 1+1 redundant| |  |1400w Power Supply 1+1 redundant|
-|  Nodes|4x 2U Rackmountable Chassis, 4x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series| +|  Nodes|13x 2U Rackmountable Chassis, 13x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series| 
-|  |4x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb)| +|  |13x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb)| 
-|  |4x 1x120GB SSD 7200RPM, 4x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio|+|  |13x 1x120GB SSD 7200RPM |
 |  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots| |  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots|
-|  |4xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)| +|  |13xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)| 
-|  |4x1620W 1+1 Redundant Power Supplies|+|  |13x600W non Redundant Power Supplies|
 |  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch| |  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch|
 |  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)| |  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)|
Line 77: Line 172:
 |  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 | |  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 |
 |  Warranty|3 Year Parts and Labor (lifetime technical support)|  |  Warranty|3 Year Parts and Labor (lifetime technical support)| 
-|  GPU Teraflops|18.72 double, 56.32 single| +|  Quote|<html><!-- estimated at $104,035 --></html>Arrived, includes S&H and Insurance|
-|  Quote|<html><!-- estimated at $106,605 --></html>Arrived, includes S&H and Insurance|+
 |Includes  |Cluster pre-installation service  | |Includes  |Cluster pre-installation service  |
  
  
-  * 5,900 Watts and 20,131 BTUs/Hour +  * 5,250 Watts and 17,913 BTUs/Hour 
-  * smaller infiniband switch (port) and ethernet switch (24 port) +  * infiniband switch (18 port needed for IPoIB) and ethernet switch (24 port)
-    * the 18 port switch has been included, swap out for $2K spare parts+
   * sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)   * sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
   * 120GB SSD drives on nodes   * 120GB SSD drives on nodes
   * storage: 42TB usable Raid 6   * storage: 42TB usable Raid 6
   * Lifetime technical support   * Lifetime technical support
 +  * Drop software install ($3.5K savings)
  
   * Spare parts   * Spare parts
Line 125: Line 219:
 |  Quote|<html><!-- $103,150 incl $800 S&H --></html>Arrived| |  Quote|<html><!-- $103,150 incl $800 S&H --></html>Arrived|
  
-  * 16TB Raid6 storage (10 TB usable - tight for /home)+  * 16TB Raid6 storage (14 TB usable - tight for /home)
   * full height rack   * full height rack
  
Line 151: Line 245:
 |  Quote|<html><!-- $105,770 incl $800 S&H --></html>Arrived| |  Quote|<html><!-- $105,770 incl $800 S&H --></html>Arrived|
  
-  * 16TB Raid6 storage (10 TB usable - tight for /home)+  * 16TB Raid6 storage (14 TB usable - tight for /home)
   * 1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)   * 1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)
  
cluster/110.1361284062.txt.gz · Last modified: 2013/02/19 14:27 by hmeij