cluster:110
This is an old revision of the document!
Table of Contents
Notes
- HP cluster off support 11/30/2013
- We need greentail/disk array support maybe 2 more years?
- Karen added to budget, Dave to approve ($2200/year)
- We need another disk array
- For robust D2D backup
- Pressed HP Procurve ethernet backup switch into production
- Dell Force 10 switch failing or traffic overwhelmed it
- Need a file server away from the login node
- We need a new cluster with support
- power consumption versus computational power
- gpu versus cpu
- 6 of 36 dell compute nodes have failed
GPU Specs
Round 2
Specs: MW - GPU
| Topic | Description |
|---|---|
| General | 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node |
| Head Node | 1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores) |
| 16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8 | |
| 2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller | |
| Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable) | |
| 1400w Power Supply 1+1 redundant | |
| Nodes | 4x 2U Rackmountable Chassis, 4x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series |
| 4x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb) | |
| 4x 1x120GB SSD 7200RPM, 4x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio | |
| ?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots | |
| 4xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable) | |
| 4x1620W 1+1 Redundant Power Supplies | |
| Network | 1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch |
| 1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables) | |
| Rack | 1x42U rack with power distributions (14U used) |
| Power | 2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P) |
| Software | CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA |
| scheduler and gnu compilers installed and configured | |
| Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 | |
| Warranty | 3 Year Parts and Labor (lifetime technical support) |
| GPU Teraflops | 18.72 double, 56.32 single |
| Quote | <html><!– estimated at $106,605 –></html>Arrived, includes S&H and Insurance |
| Includes | Cluster pre-installation service |
- 5,900 Watts and 20,131 BTUs/Hour
- smaller infiniband switch (8 port) and ethernet switch (24 port)
- the 18 port switch has been included, swap out for $2K spare parts
- sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
- 120GB SSD drives on nodes
- storage: 42TB usable Raid 6
- Lifetime technical support
- Spare parts
- ?
- Expand Storage
- upgrade to 56TB usable Raid 6 ($5.3K using 16x4TB disks)
- upgrade to 90TB usable Raid 60 ($10.3K using 34x3TB disks)
- Alternate storage:
- add storage server of 2.4 TB Usable 15K fast speed SAS disk ($9K-1K of 4U chassis)
- leave 18TB local storage on head node
Specs: MW - CPU
| Topic | Description |
|---|---|
| General | 13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb) |
| Head Node | 1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores) |
| 16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8 | |
| 2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller | |
| Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable) | |
| 1400w Power Supply 1+1 redundant | |
| Nodes | 13x 2U Rackmountable Chassis, 13x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series |
| 13x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb) | |
| 13x 1x120GB SSD 7200RPM | |
| ?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots | |
| 13xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable) | |
| 13x600W non Redundant Power Supplies | |
| Network | 1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch |
| 1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables) | |
| Rack | 1x42U rack with power distributions (14U used) |
| Power | 2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P) |
| Software | CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA |
| scheduler and gnu compilers installed and configured | |
| Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5 | |
| Warranty | 3 Year Parts and Labor (lifetime technical support) |
| Quote | <html><!– estimated at $104,035 –></html>Arrived, includes S&H and Insurance |
| Includes | Cluster pre-installation service |
- 5,250 Watts and 17,913 BTUs/Hour
- infiniband switch (18 port needed for IPoIB) and ethernet switch (24 port)
- sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
- 120GB SSD drives on nodes
- storage: 42TB usable Raid 6
- Lifetime technical support
- Drop software install ($3.5K savings)
- Spare parts
- ?
- Expand Storage
- upgrade to 56TB usable Raid 6 ($5.3K using 16x4TB disks)
- upgrade to 90TB usable Raid 60 ($10.3K using 34x3TB disks)
- Alternate storage:
- add storage server of 2.4 TB Usable 15K fast speed SAS disk ($9K-1K of 4U chassis)
- leave 18TB local storage on head node
Specs: EC GPU
| Topic | Description |
|---|---|
| General | 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node (256gb) |
| Head Node | 1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores |
| 16x16GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full | |
| 2x2TB RAID1 7200RPM, 8x2TB RAID6 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s | |
| 1920w Power Supply, redundant | |
| Nodes | 4x2U Rackmountable Chassis, 4×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series |
| 32x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 32gb/gpu, max 256gb) | |
| 4x1TB 7200RPM, 4x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio | |
| 2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots | |
| 4xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s | |
| 4x1800W Redundant Power Supplies | |
| Network | 1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7×7' cables (2 uplink cables) |
| 1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables) | |
| Rack & Power | 42U, 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU |
| Software | CentOS, Bright Cluster Management (1 year support) |
| Amber12 (cluster install), Lammps (shared filesystem), (no NAMD) | |
| Warranty | 3 Year Parts and Labor (EC technical support?) |
| GPU Teraflops | 18.72 double, 56.32 single |
| Quote | <html><!– $103,150 incl $800 S&H –></html>Arrived |
- 16TB Raid6 storage (14 TB usable - tight for /home)
- full height rack
Specs: EC CPU
| Topic | Description |
|---|---|
| General | 13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb) |
| Head Node | 1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores |
| 16x16GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full | |
| 2x2TB RAID1 7200RPM, 8x2TB RAID6 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s | |
| 1920w Power Supply, redundant | |
| Nodes | 13x1U Rackmountable Chassis, 13×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series |
| 104x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max ???gb) | |
| 13x1TB 7200RPM | |
| 2×10/100/1000 NIC, Dedicated IPMI Port, 1x PCIE 3.0 x16 Slots | |
| 13xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s | |
| 13x480W non Redundant Power Supplies | |
| Network | 1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 7×7' cables (2 uplink cables) |
| 1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables) | |
| Rack & Power | 42U, 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU |
| Software | CentOS, Bright Cluster Management (1 year support) |
| Amber12 (cluster install), Lammps (shared filesystem), NAMD | |
| Warranty | 3 Year Parts and Labor (EC technical support?) |
| Quote | <html><!– $105,770 incl $800 S&H –></html>Arrived |
- 16TB Raid6 storage (14 TB usable - tight for /home)
- 1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)
Round 1
ConfCall & Specs: AC
09nov12:
- /home and /apps mounted on CPU side. How does GPU access these? Or is job on CPU responsible for this?
- Single versus double precision? Both needed I assume.
- Unit above is Nvidia “Fermi” series, being phased out. “Kepler” K10 and K20 series coming out. Get an earlybird unit, Jim will find out.
- Lava compatibility (almost certain but need to check) AC uses SGE.
- We do not really “know” if our current jobs would experience a boost in speed (hence one unit first - but there is a software problem here)
- Intel Xeon Phi Co-Processors: Intel compilers will work on this platform (which is huge!) and no programming learning curve. (HP Proliant servers with 50+ cores), Jim will find out.
Vendor states scheduler sees GPUs directly (but how does it then get access to home dirs, check this out)… update: this is not true, CPU job offloads to GPU
AC Specs
- Early 2013 product line up
- Quote coming for single 4U unit, which could be a one off test unit (compare to HP)
| Topic | Description |
|---|---|
| General | 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node |
| Head Node | None |
| Nodes | 1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series |
| 8x4GB 240-Pin DDR3 1600 MHz memory (32gb/node), 11gb/gpu, max 256gb) | |
| 1x120GB SATA 2.5“ Solid State Drive (OS drive), 7x3TB 7200RPM | |
| 3xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio | |
| 2×10/100/1000 NIC, 3x PCIE 3.0 x16 Slots | |
| 1xConnectX-3 VPI adapter card, single-port 56Gb/s | |
| 2x1620W Redundant Power Supplies | |
| Network | 1×36 port Infiniband FDR (56Gb/s) switch & 4xConnectX-3 single port FDR (56Gb/s) IB adapter + 2x 2 meter cables (should be 4) |
| Power | Rack power ready |
| Software | None |
| Warranty | 3 Year Parts and Labor (AC technical support) |
| GPU Teraflops | 3.51 double, 10.56 single |
| Quote | <html><!– $33,067.43 S&H included –></html>Arrived |
- In order to match the “benchmark option” we need 5 units
- 8100 Watts, would still fit power wise but not rack wise (we'd need 20U)
- Single rack, 21 TB of disk space (Raid 5/6)
- The IB switch (plus 4 spare cards/cables) is roughly 1/3rd of the price
- If we remove it, we need QDR Voltaire compliant HCAs and cables (3 ports free)
- The config does not pack as much teraflops for the dollars; we'll see
ConfCall & Specs: EC
12nov12:
- GPU hardware only
- scheduler never sees gpus just cpus
- cpu to gpu is one-to-one when using westmere chips
- bright cluster management (image based) - we can front end with lava
- what's the memory connection cpu/gpu???
- home dirs - cascade via voltaire 4036, need to make sure this is compatible!
- software on local disk? home dirs via infiniband ipoib, yes, but self install
- amber (charge for this) and lammps preinstalled - must be no problem, will be confirmed
- 2 K20 per 2 CPUs per rack 900-1000W, 1200 W power supply on each node
- PDU on simcluster, each node has power connection
- quote coming for 4 node simcluster
- testing periods can be staged so you are testing exactly what we're buying if simcluster if within budget (see K20 above)
EC Specs
| Topic | Description |
|---|---|
| General | 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 64 gb ram/node, plus head node |
| Head Node | 1x1U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores |
| 8x8GB 240-Pin DDR3 1600 MHz ECC (max 256gb), 2×10/100/1000 NIC, 2x PCIe x16 Full | |
| 2x2TB 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s | |
| 600w Power Supply | |
| Nodes | 4x2U Rackmountable Chassis, 8xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series |
| 32x8GB 240-Pin DDR3 1600 MHz (64gb/node memory, 16gb/gpu, max 256gb) | |
| 4x1TB 7200RPM, 16xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio | |
| 2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots | |
| 4xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s | |
| 4x1800W Redundant Power Supplies | |
| Network | 1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7' cables |
| 1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables) | |
| Power | 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU |
| Software | CentOS, Bright Cluster Management (1 year support) |
| Amber12 (cluster install), Lammps (shared filesystem), (Barracuda for weirlab?) | |
| Warranty | 3 Year Parts and Labor (EC technical support?) |
| GPU Teraflops | 18.72 double, 56.32 single |
| Quote | <html><!– $93,600 + S&H –></html>Arrived |
- Lets make this the “benchmark option” based on double precision
- In order to match this with Xeon Phis we'd need 18 of them (probably 5 4U trays)
- This is the (newest) simcluster design (that can be tested starting Jan 2013)
- 24U cabinet
- We could deprecate 50% of bss24 queue freeing two L6-30 connectors
- Spare parts:
- Add another HCA card to greentail and connect to Mellanox switch (long cable)
- also isolates GPU traffic from other clusters
- 1 8-port switch, 4 HCA cards, 4 long cables (for petal/swallow tails plus spare)
- New head node
- First let EC install Bright/Openlava (64 CPU cores implies 64 job slots)
- 16 GPUs implies 16×2,500 or 40,000 cuda cores (625 per job slot on average)
- Use as standalone cluster or move GPU queue to greentail
- If so, turn this head node into a 16 job slot ram heavy compute node?
- 256-512gb (Order?)
- add local storage? (up to 10 1or2 TB disks)
- Compute nodes
- add local storage? (up to 10 1or2 TB disks)
- Bright supports openlava and GPU monitoring (get installed)
- EC software install
- sander, sander.MPI, pmemd, pmemd.cuda (single GPU version), pmemd.cuda.MPI (the multi-GPU version)
- NVIDIA Toolkit v4.2. Please note that v5.0 is NOT currently supported
- MVAPICH2 V1.8 or later / MPICH2 v1.4p1 or later recommended, OpenMPI is NOT recommended.)
- make sure they do not clean source, analyze how they compiled
- which compiler will they use? which MPI
(prefer OpenMPI, have wrapper script for that)
ConfCall & Specs: HP
HP 19nov12: meeting notes
- HP ProLiant SL270s Generation 8 (Gen8); 4U half width with 2 CPUs + 8 (max) GPUs
- The s6500 Chassis is 4U tray holding two S270s servers
- max 8 GPUs (20,000 cuda cores) + 2 CPUs (total 16 cores), dual drives, 256gb max
- K20 availability will be confirmed by Charlie
- power
- Charlie will crunch numbers of existing HPC and assess if we can use the current rack
- otherwise a stand alone half rack solution
one IB cable to Voltaire per chassis?get new FDR infiniband switch, period.- connect greentail with additional HCA card, or voltaire to voltaire?
- our software compilation problem, huge
- but they have great connections with Nvidia for compilation help (how to qualify that?)
- CMU for GPU monitoring, 3-rendering of what GPU is doing
- This SL270s can also support up to 8 Xeon Phi coprocessors
- but expect very lengthy delays, Intel is not ready for delivery (1 Phi = 1 double teraflop)
HP Specs
http://h18004.www1.hp.com/products/quickspecs/14405_div/14405_div.HTML
First unit, single tray in chassis- This hardware can be tested at ExxactCorp so single tray purchase for testing not a requirement
- 2 chassis in 8U + 4 SL250s + each with 8 GPUs would be a massive GPU cruncher
- 8 CPUs, 32 GPUs = 64 cpu cores and 80,000 cuda cores (avg 1,250cuda/core)
- peak performance: 37.44 double, 112.64 single precision (twice the “benchmark option”)
- 1 chassis in 4U + 2 Sl250s + each with * GPUs would the “benchmark option”
| Topic | Description |
|---|---|
| General | 6 CPUs (total 48 cores), 18 GPUs (45,000 cuda cores), 64 gb ram/node, no head node |
| Head Node | None |
| Chassis | 2xs6500 Chassis (4U) can each hold 2 half-width SL250s(gen8, 4U) servers, rackmounted, 4x1200W power supplies, 1x4U rack blank |
| Nodes | 3xSL250s(gen8), 3x2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores (total 16 cores/node)), Romley series |
| 3x16x8GB 240-Pin DDR3 1600 MHz (64gb/node, 10+ gb/gpu, max 256gb) | |
| 3x2x500GB 7200RPM, 3x6xNVIDIA Tesla K20 5 GB GPUs (5 gpu/node), 1CPU-to-3GPU ratio | |
| 3x2x10/100/1000 NIC, Dedicated IPMI Port, 3x8x PCIE 3.0 x16 Slots (GPU), 3x2x PCIE 3.0 x8 | |
| 3x2xIB interconnect, QDR 40Gb/s, FlexibleLOM goes into PCI3x8 slot | |
| chassis supplied power; 3x1x one PDU power cord (416151-B21)? - see below | |
| Network | 1xVoltaire QDR 36-port infiniband 40 Gb/s switch, + 6x 5M QSFP IB cables |
| No ethernet switch, 17x 7' CAT5 RJ45 cables | |
| Power | rack PDU ready, what is 1x HP 40A HV Core Only Corded PDU??? |
| Software | RHEL, CMU GPU enabled (1 year support) - not on quote??? |
| Warranty | 3 Year Parts and Labor (HP technical support?) |
| GPU Teraflops | 21.06 double, 63.36 single |
| Quote | <html><!– $128,370, for a 1×6500+2xSl250 setup estimate is $95,170 –></html>Arrived (S&H and insurance?) |
- To compare with “benchmark option” price wise; 37% higher (25% less CPU cores)
- To compare with “benchmark option” performance; 12.5% higher (double precision peak)
- When quote is reduced to 1x s6500 chassis and 2x SL250s:
- To compare with “benchmark option” price wise; 1.6% higher (50% less CPU cores)
- To compare with “benchmark option” performance; 25% lower (double precision peak)
- HP on site install
- we have 9U in HP rack available (1U for new switch)
- L6-30 7,500 Watts x3 PDUs (non-UPS) = 22,500 Watts - HP cluster 10,600 Watts
- leaves 11,898 Watts, should be sufficient for 4 SL270s(redundant power supplies)
- new infiniband switch, isolates GPU cluster traffic from rest of HPC
- 36 port IB switch overkill
- still need IB connection greentail to new switch (home dirs IPoIB)
- 1 TB local storage per node
- our software install problem, so is the 12.5% worth it? (with 3 trays)
ConfCall & Specs: AX
- Cluster management is ROCKS (we'll pass)
- No scheduler (that's OK, we'll use OpenLava)
- They do not install software, only operating system and
- CUDA driver setup and installation
AX Specs
http://www.amax.com/hpc/productdetail.asp?product_id=simcluster Fremont, CA
| Topic | Description |
|---|---|
| General | 8 CPUs (48 cores), 12 GPUs (30,000 cuda cores), 64 gb ram/node, plus head node |
| Head Node | 1x1U Rackmount System, 2x Intel Xeon E5-2620 2.0GHz (12 cores total) |
| 64GB DDR3 1333MHz (max 256gb), 2×10/100/1000 NIC, 2x PCIe x16 Full | |
| 2x1TB (Raid 1) 7200RPM, InfiniBand adapter card, Single-Port, QSFP 40Gb/s | |
| ???w Power Supply, CentOS | |
| Nodes | 4x1U, 4x2xIntel Xeon E5-2650 2.0GHz, with 6 cores (12cores/node) Romley series |
| 4x96GB 240-Pin DDR3 1600 MHz (96gb/node memory, 8gb/gpu, max 256gb) | |
| 4x1TB 7200RPM, 12xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio | |
| 2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots | |
| 4xInfiniband adapter card, Single-Port, QSFP 40Gb/s | |
| 4x??00W Redundant Power Supplies | |
| Network | 1x Infiniband Switch (18 ports)& HCAs (single port) + ?' cables |
| 1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (+ ?' cables) | |
| Power | there are 3 rack PDUs? What are the connectors, L6-30? |
| Software | CUDA only |
| Warranty | 3 Year Parts and Labor (AX technical support?) |
| GPU Teraflops | 14.04 double, 42.96 single |
| Quote | <html><!– $73,965 (S&H $800 included) –></html>Arrived |
- 22U cabinet
- Insurance during shipping is our problem (non-returnable)
- To compare with “benchmark option” price wise; 21% lower (25% less CPU cores)
- To compare with “benchmark option” performance; 22% lower (double precision peak)
- If we go turnkey systems having software installed is huge
ConfCall & Specs: MW
- sells both individual racks and turn-key systems
- racks are 4U with 2 CPUs and 8 GPUs, 2200 Watts, K20X GPUs
- turn-key units are per customer specifications
- they will install all software components (if license keys are provided)
- includes CUDA drivers and setup, Amber (pmemd.cuda & pmemd.cuda.MPI, check) and Lammps
- but also Matlab and Mathematica if needed (wow!)
- standard 2 year warranty though (no biggie)
MW Specs
http://www.microway.com/tesla/clusters.html Plymouth, MA
| Topic | Description |
|---|---|
| General | 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 32 gb ram/node, plus head node |
| Head Node | 1x2U Rackmount System, 2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores |
| 8x4GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 3x PCIe x16 Full, 3x PCIe x8 | |
| 2x1TB 7200RPM (Raid 1) + 6x2TB (Raid 6), Areca Raid Controller | |
| Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s | |
| 740w Power Supply 1+1 redundant | |
| Nodes | 4x1U Rackmountable Chassis, 4×2 Xeon E5-2650 2.0 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series |
| 4x8x4GB 240-Pin DDR3 1600 MHz (32gb/node memory, 8gb/gpu, max 256gb) | |
| 4x1x120GB SSD 7200RPM, 4x4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio | |
| 2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots | |
| 4xConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s | |
| 4x1800W (non) Redundant Power Supplies | |
| Network | 1x Mellanox InfiniBand FDR Switch (36 ports)& HCAs (single port) + 3m cable FDR to existing Voltaire switch |
| 1x 1U 48 Port Rackmount Switch, 10/100/1000, Unmanaged (cables) | |
| Rack | |
| Power | 2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P) |
| Software | CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA 5 |
| scheduler and gnu compilers installed and configured | |
| Amber12, Lammps, Barracuda (for weirlab?), and others if desired …bought through MW | |
| Warranty | 3 Year Parts and Labor (lifetime technical support) |
| GPU Teraflops | 18.72 double, 56.32 single |
| Quote | <html><!– estimated at $95,800 –></html>Arrived, includes S&H and Insurance |
| Upgrades | Cluster pre-installation service |
| 5×2 E5-2660 2.20 Ghz 8 core CPUs | |
| 5x upgrade to 64 GB per node |
- At full load 5,900 Watts and 20,131 BTUs/hour
- 2% more expansive than “benchmark option” (as described above with Upgrades), else identical
- But a new rack (advantageous for data center)
- With lifetime technical support
- solid state drives on compute nodes
- 12 TB local storage (8TB usable)
Then
- 36 port FDR switch replace with 8 port QDR switch for savings (40 vs 56 Gbps)
- and all server adapter cards to QDR (with one hook up to existing Voltaire switch)
- Expand memory footprint
- Go to 124 GB memory/noe to beef up the CPU HPC side of things
- 16 cpu cores/nodes minus 4 cpu/gpu cores/node = 12 cpu cores using 104gb which is about 8 GB/cpu core
- Online testing available (K20, do this)
- then decide on PGI compiler at purchase time
- maybe all Lapack libraries too
- Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)
- Leave the 6x2TB disk space (for backup)
- 2U, 8 drives up to 6×4=24 TB, possible?
- Add an entry level Infiniband/Lustre solution
- for parallel file locking
- Spare parts
- 8 port switch, HCAs and cables, drives …
- or get 5 years total warranty
- Testing notes
- Amber, LAMMPS, NAMD
- cuda v4&5
- install/config dirs
- use gnu … with openmpi
- make deviceQuery
cluster/110.1363193226.txt.gz · Last modified: by hmeij
