User Tools

Site Tools


cluster:110

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:110 [2013/02/19 09:26]
hmeij [Specs: MW - GPU]
cluster:110 [2013/05/24 09:39] (current)
hmeij [Specs: MW - GPU]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
 +
 +===== Notes =====
 +
 +  * HP cluster off support 11/30/2013
 +  * We need greentail/disk array support maybe 2 more years?
 +    * Karen added to budget, Dave to approve ($2200/year)
 +  * We need another disk array
 +    * For robust D2D backup
 +  * Pressed HP Procurve ethernet backup switch into production
 +    * Dell Force 10 switch failing or traffic overwhelmed it
 +    * Need a file server away from the login node
 +  * We need a new cluster with support
 +    * power consumption versus computational power
 +    * gpu versus cpu
 +    * 6 of 36 dell compute nodes have failed
  
 ===== GPU Specs ===== ===== GPU Specs =====
 +
 +===== Round 3 =====
 +
 +
 +
 +==== Specs: MW - GPU ====
 +
 +This is what we ended up buying May 2013.
 +
 +^  Topic^Description  ^
 +|  General| 10 CPUs (80 cores), 20 GPUs (45,000 cuda cores), 256 gb ram/node (1,280 gb total), plus head node (128 gb)|
 +|  Head Node|1x42U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)|
 +|  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8|
 +|  |2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller|
 +|  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)|
 +|  |1400w Power Supply 1+1 redundant|
 +|  Nodes|5x 2U Rackmountable Chassis, 5x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series|
 +|  |5x 16x16GB 240-Pin DDR3 1600 MHz (256gb/node memory, max 256gb)|
 +|  |5x 1x120GB SSD 7200RPM, 5x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 5x 4 PCIE 3.0 x16 Slots, 5x 8 PCIE 3,0 x8 Slots|
 +|  |5xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)|
 +|  |5x1620W 1+1 Redundant Power Supplies|
 +|  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch|
 +|  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)|
 +|Rack  |1x42U rack with power distributions (14U used)|
 +|  Power|2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)|
 +|  Software| CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA|
 +|  | scheduler and gnu compilers installed and configured|
 +|  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 |
 +|  Warranty|3 Year Parts and Labor (lifetime technical support)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- estimated at $124,845 --></html>Arrived, includes S&H and Insurance|
 +|Includes  |Cluster pre-installation service  |
 +
 +
 +  * 16U - estimated draw 6,900 Watts and 23,713 BTUs cooling - $30K/year
 +  * 5 GPU shelves
 +  * 2 PDUs
 +  * 42 TB raw
 +  * FDR interconnects
 +  * 120GB SSD drives on nodes
 +  * 256 gb ram on nodes, 16gb/core
 +  * Areca hardware raid
 +  * Lifetime technical support
 +
 +==== Specs: EC GPU ====
 +
 +
 +^  Topic^Description  ^
 +|  General| 12 CPUs (96 cores), 20 GPUs (45,000 cuda cores), 128 gb ram/node (640 gb total), plus head node (128gb)|
 +|  Head Node|1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores|
 +|  |8x16GB 240-Pin DDR3 1600 MHz ECC (128gb, max 512gb), 2x10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full|
 +|  |2x2TB RAID1 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |1920w Power Supply, redundant|
 +|  Nodes|6x2U Rackmountable Chassis, 6x2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series|
 +|  |48x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 8gb/core, max 256gb)|
 +|  |6x1TB 7200RPM, 5x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio|
 +|  |2x10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots|
 +|  |6xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s|
 +|  |6x1800W Redundant Power Supplies|
 +|  Network|1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) +9x7' cables (2 uplink cables)|
 +|  |1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)|
 +|  Rack & Power|42U, 4xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU|
 +|  Software| CentOS, Bright Cluster Management (1 year support)|
 +|  | Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)|
 +|  Storage|3U 52TB Disk Array (28x2TB) Raid 6, cascade cable|
 +|  Warranty|3 Year Parts and Labor (EC technical support?)| 
 +|  GPU Teraflops|23.40 double, 70.40 single|
 +|  Quote|<html><!-- $124,372 incl $800 S&H --></html>Arrived|
 +
 +
 +  * 20U - estimated draw 7,400 Watts - $30K/year for cooling and power
 +  * 5 GPU shelves
 +  * 1 CPU shelf
 +  * 4 PDU - this could be a problem!
 +  * 56TB raw 
 +  * QDR interconnects
 +  * 1 TB disk on node, makes for a large /localscratch
 +  * LSI hardware raid card
 +
  
 ===== Round 2 ===== ===== Round 2 =====
Line 55: Line 150:
  
 ==== Specs: MW - CPU ==== ==== Specs: MW - CPU ====
 +
 +^  Topic^Description  ^
 +|  General|13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb)|
 +|  Head Node|1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)|
 +|  |16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8|
 +|  |2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller|
 +|  |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)|
 +|  |1400w Power Supply 1+1 redundant|
 +|  Nodes|13x 2U Rackmountable Chassis, 13x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series|
 +|  |13x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb)|
 +|  |13x 1x120GB SSD 7200RPM |
 +|  |?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots|
 +|  |13xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)|
 +|  |13x600W non Redundant Power Supplies|
 +|  Network|1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch|
 +|  |1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)|
 +|Rack  |1x42U rack with power distributions (14U used)|
 +|  Power|2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)|
 +|  Software| CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA|
 +|  | scheduler and gnu compilers installed and configured|
 +|  | Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 |
 +|  Warranty|3 Year Parts and Labor (lifetime technical support)| 
 +|  Quote|<html><!-- estimated at $104,035 --></html>Arrived, includes S&H and Insurance|
 +|Includes  |Cluster pre-installation service  |
 +
 +
 +  * 5,250 Watts and 17,913 BTUs/Hour
 +  * infiniband switch (18 port needed for IPoIB) and ethernet switch (24 port)
 +  * sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
 +  * 120GB SSD drives on nodes
 +  * storage: 42TB usable Raid 6
 +  * Lifetime technical support
 +  * Drop software install ($3.5K savings)
 +
 +  * Spare parts
 +    * ?
 +  * Expand Storage
 +    * upgrade to 56TB usable Raid 6 ($5.3K using 16x4TB disks)
 +    * upgrade to 90TB usable Raid 60 ($10.3K using 34x3TB disks)
 +
 +  * Alternate storage:
 +    * add storage server of 2.4 TB Usable 15K fast speed SAS disk ($9K-1K of 4U chassis)
 +    * leave 18TB local storage on head node
  
  
Line 81: Line 219:
 |  Quote|<html><!-- $103,150 incl $800 S&H --></html>Arrived| |  Quote|<html><!-- $103,150 incl $800 S&H --></html>Arrived|
  
-  * 16TB Raid6 storage (10 TB usable - tight for /home)+  * 16TB Raid6 storage (14 TB usable - tight for /home)
   * full height rack   * full height rack
  
Line 107: Line 245:
 |  Quote|<html><!-- $105,770 incl $800 S&H --></html>Arrived| |  Quote|<html><!-- $105,770 incl $800 S&H --></html>Arrived|
  
-  * 16TB Raid6 storage (10 TB usable - tight for /home)+  * 16TB Raid6 storage (14 TB usable - tight for /home)
   * 1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)   * 1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)
  
cluster/110.1361283999.txt.gz ยท Last modified: 2013/02/19 09:26 by hmeij