User Tools

Site Tools


cluster:110

This is an old revision of the document!



Back

GPU Specs

Round 2

Specs: MW - GPU

TopicDescription
General 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node
Head Node1x4U Rackmount System (36 drive bays), 2xXeon E5-2660 2.0 Ghz 20MB Cache 8 cores (total 16 cores)
16x16GB 240-Pin DDR3 1600 MHz ECC (total 256gb, max 512gb), ?x10/100/1000 NIC (3 cables), 3x PCIe x16 Full, 3x PCIe x8
2x1TB 7200RPM (Raid 1) + 16x3TB (Raid 6), Areca Raid Controller
Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s (1 cable)
1400w Power Supply 1+1 redundant
Nodes4x 2U Rackmountable Chassis, 4x 2 Xeon E5-2660 2.0 Ghz 20MB Cache 8 cores (16 cores/node), Sandy Bridge series
4x 8x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max 256gb)
4x 1x120GB SSD 7200RPM, 4x 4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio
?x10/100/1000 NIC (1 cable), Dedicated IPMI Port, 4x 4 PCIE 3.0 x16 Slots, 4x 8 PCIE 3,0 x8 Slots
4xConnectX-3 VPI adapter card, Single-Port, QDRFDR 40/56 Gb/s (1 cable)
4x1620W 1+1 Redundant Power Supplies
Network1x 1U Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 3m cable QDR to existing Voltaire switch
1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)
Rack 1x42U rack with power distributions (14U used)
Power2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)
Software CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA
scheduler and gnu compilers installed and configured
Amber12 (customer provide license) , Lammps, NAMD, Cuda 4.2 (for apps) & 5
Warranty3 Year Parts and Labor (lifetime technical support)
GPU Teraflops18.72 double, 56.32 single
Quote<!-- estimated at $106,605 -->Arrived, includes S&H and Insurance
Includes Cluster pre-installation service
  • 5,900 Watts and 20,131 BTUs/Hour
  • smaller infiniband switch (8 port) and ethernet switch (24 port)
    • the 18 port switch has been included, swap out for $2K spare parts
  • sandy bridge chip E2660 and larger memory footprint (128gb node, 256gb head node)
  • 120GB SSD drives on nodes
  • storage: 42TB usable Raid 6
  • Lifetime technical support
  • Spare parts
    • ?
  • Expand Storage
    • upgrade to 56TB usable Raid 6 ($5.3K using 16x4TB disks)
    • upgrade to 90TB usable Raid 60 ($10.3K using 34x3TB disks)
  • Alternate storage:
    • add storage server of 2.4 TB Usable 15K fast speed SAS disk ($9K-1K of 4U chassis)
    • leave 18TB local storage on head node

Specs: MW - CPU

Specs: EC GPU

TopicDescription
General 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 128 gb ram/node, plus head node (256gb)
Head Node1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
16x16GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full
2x2TB RAID1 7200RPM, 8x2TB RAID6 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
1920w Power Supply, redundant
Nodes4x2U Rackmountable Chassis, 4×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series
32x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, 32gb/gpu, max 256gb)
4x1TB 7200RPM, 4x4xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio
2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
4xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
4x1800W Redundant Power Supplies
Network1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7×7' cables (2 uplink cables)
1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Rack & Power42U, 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software CentOS, Bright Cluster Management (1 year support)
Amber12 (cluster install), Lammps (shared filesystem), (no NAMD)
Warranty3 Year Parts and Labor (EC technical support?)
GPU Teraflops18.72 double, 56.32 single
Quote<!-- $103,150 incl $800 S&H -->Arrived
  • 16TB Raid6 storage (10 TB usable - tight for /home)
  • full height rack

Specs: EC CPU

TopicDescription
General 13 nodes, 26 CPUs (208 cores), 128 gb ram/node (total 1,664 gb), plus head node (256gb)
Head Node1x2U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
16x16GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 1x PCIe x16 Full, 6x PCIe x8 Full
2x2TB RAID1 7200RPM, 8x2TB RAID6 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
1920w Power Supply, redundant
Nodes13x1U Rackmountable Chassis, 13×2 Xeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series
104x16GB 240-Pin DDR3 1600 MHz (128gb/node memory, max ???gb)
13x1TB 7200RPM
2×10/100/1000 NIC, Dedicated IPMI Port, 1x PCIE 3.0 x16 Slots
13xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
13x480W non Redundant Power Supplies
Network1x Mellanox InfiniBand QDR Switch (18 ports)& HCAs (single port) + 7×7' cables (2 uplink cables)
1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Rack & Power42U, 2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software CentOS, Bright Cluster Management (1 year support)
Amber12 (cluster install), Lammps (shared filesystem), NAMD
Warranty3 Year Parts and Labor (EC technical support?)
Quote<!-- $105,770 incl $800 S&H -->Arrived
  • 16TB Raid6 storage (10 TB usable - tight for /home)
  • 1TB on nodes is wasted (unless we make fast local /localscratch at 7.2K)

Round 1

ConfCall & Specs: AC

09nov12:

  • /home and /apps mounted on CPU side. How does GPU access these? Or is job on CPU responsible for this?
  • Single versus double precision? Both needed I assume.
  • Unit above is Nvidia “Fermi” series, being phased out. “Kepler” K10 and K20 series coming out. Get an earlybird unit, Jim will find out.
  • Lava compatibility (almost certain but need to check) AC uses SGE.
  • We do not really “know” if our current jobs would experience a boost in speed (hence one unit first - but there is a software problem here)
  • Intel Xeon Phi Co-Processors: Intel compilers will work on this platform (which is huge!) and no programming learning curve. (HP Proliant servers with 50+ cores), Jim will find out.
  • Vendor states scheduler sees GPUs directly (but how does it then get access to home dirs, check this out) … update: this is not true, CPU job offloads to GPU

AC Specs

TopicDescription
General 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node
Head Node None
Nodes1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series
8x4GB 240-Pin DDR3 1600 MHz memory (32gb/node), 11gb/gpu, max 256gb)
1x120GB SATA 2.5“ Solid State Drive (OS drive), 7x3TB 7200RPM
3xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio
2×10/100/1000 NIC, 3x PCIE 3.0 x16 Slots
1xConnectX-3 VPI adapter card, single-port 56Gb/s
2x1620W Redundant Power Supplies
Network1×36 port Infiniband FDR (56Gb/s) switch & 4xConnectX-3 single port FDR (56Gb/s) IB adapter + 2x 2 meter cables (should be 4)
Power Rack power ready
Software None
Warranty3 Year Parts and Labor (AC technical support)
GPU Teraflops 3.51 double, 10.56 single
Quote<!-- $33,067.43 S&H included -->Arrived
  • In order to match the “benchmark option” we need 5 units
    • 8100 Watts, would still fit power wise but not rack wise (we'd need 20U)
  • Single rack, 21 TB of disk space (Raid 5/6)
  • The IB switch (plus 4 spare cards/cables) is roughly 1/3rd of the price
    • If we remove it, we need QDR Voltaire compliant HCAs and cables (3 ports free)
  • The config does not pack as much teraflops for the dollars; we'll see

ConfCall & Specs: EC

12nov12:

  • GPU hardware only
  • scheduler never sees gpus just cpus
  • cpu to gpu is one-to-one when using westmere chips
  • bright cluster management (image based) - we can front end with lava
  • what's the memory connection cpu/gpu???
  • home dirs - cascade via voltaire 4036, need to make sure this is compatible!
  • software on local disk? home dirs via infiniband ipoib, yes, but self install
  • amber (charge for this) and lammps preinstalled - must be no problem, will be confirmed
  • 2 K20 per 2 CPUs per rack 900-1000W, 1200 W power supply on each node
  • PDU on simcluster, each node has power connection
  • quote coming for 4 node simcluster
  • testing periods can be staged so you are testing exactly what we're buying if simcluster if within budget (see K20 above)

EC Specs

TopicDescription
General 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 64 gb ram/node, plus head node
Head Node1x1U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores
8x8GB 240-Pin DDR3 1600 MHz ECC (max 256gb), 2×10/100/1000 NIC, 2x PCIe x16 Full
2x2TB 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
600w Power Supply
Nodes4x2U Rackmountable Chassis, 8xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series
32x8GB 240-Pin DDR3 1600 MHz (64gb/node memory, 16gb/gpu, max 256gb)
4x1TB 7200RPM, 16xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio
2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
4xConnectX-2 VPI adapter card, Single-Port, QDR 40Gb/s
4x1800W Redundant Power Supplies
Network1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7' cables
1x 1U 16 Port Rackmount Switch, 10/100/1000, Unmanaged (+ 7' cables)
Power2xPDU, Basic, 1U, 30A, 208V, (10) C13, Requires 1x L6-30 Power Outlet Per PDU
Software CentOS, Bright Cluster Management (1 year support)
Amber12 (cluster install), Lammps (shared filesystem), (Barracuda for weirlab?)
Warranty3 Year Parts and Labor (EC technical support?)
GPU Teraflops18.72 double, 56.32 single
Quote<!-- $93,600 + S&H -->Arrived
  • Lets make this the “benchmark option” based on double precision
  • In order to match this with Xeon Phis we'd need 18 of them (probably 5 4U trays)
  • This is the (newest) simcluster design (that can be tested starting Jan 2013)
    • 24U cabinet
  • We could deprecate 50% of bss24 queue freeing two L6-30 connectors
  • Spare parts:
    • Add another HCA card to greentail and connect to Mellanox switch (long cable)
      • also isolates GPU traffic from other clusters
    • 1 8-port switch, 4 HCA cards, 4 long cables (for petal/swallow tails plus spare)
  • New head node
    • First let EC install Bright/Openlava (64 CPU cores implies 64 job slots)
      • 16 GPUs implies 16×2,500 or 40,000 cuda cores (625 per job slot on average)
    • Use as standalone cluster or move GPU queue to greentail
    • If so, turn this head node into a 16 job slot ram heavy compute node?
      • 256-512gb (Order?)
      • add local storage? (up to 10 1or2 TB disks)
  • Compute nodes
    • add local storage? (up to 10 1or2 TB disks)
  • Bright supports openlava and GPU monitoring (get installed)
  • EC software install
    • sander, sander.MPI, pmemd, pmemd.cuda (single GPU version), pmemd.cuda.MPI (the multi-GPU version)
    • NVIDIA Toolkit v4.2. Please note that v5.0 is NOT currently supported
    • MVAPICH2 V1.8 or later / MPICH2 v1.4p1 or later recommended, OpenMPI is NOT recommended.)
    • make sure they do not clean source, analyze how they compiled
    • which compiler will they use? which MPI (prefer OpenMPI, have wrapper script for that)

ConfCall & Specs: HP

HP 19nov12: meeting notes

  • HP ProLiant SL270s Generation 8 (Gen8); 4U half width with 2 CPUs + 8 (max) GPUs
    • The s6500 Chassis is 4U tray holding two S270s servers
  • max 8 GPUs (20,000 cuda cores) + 2 CPUs (total 16 cores), dual drives, 256gb max
    • K20 availability will be confirmed by Charlie
  • power
    • Charlie will crunch numbers of existing HPC and assess if we can use the current rack
    • otherwise a stand alone half rack solution
  • one IB cable to Voltaire per chassis? get new FDR infiniband switch, period.
    • connect greentail with additional HCA card, or voltaire to voltaire?
  • our software compilation problem, huge
    • but they have great connections with Nvidia for compilation help (how to qualify that?)
  • CMU for GPU monitoring, 3-rendering of what GPU is doing
  • This SL270s can also support up to 8 Xeon Phi coprocessors
    • but expect very lengthy delays, Intel is not ready for delivery (1 Phi = 1 double teraflop)

HP Specs

http://h18004.www1.hp.com/products/quickspecs/14405_div/14405_div.HTML

  • First unit, single tray in chassis
  • This hardware can be tested at ExxactCorp so single tray purchase for testing not a requirement
  • 2 chassis in 8U + 4 SL250s + each with 8 GPUs would be a massive GPU cruncher
    • 8 CPUs, 32 GPUs = 64 cpu cores and 80,000 cuda cores (avg 1,250cuda/core)
    • peak performance: 37.44 double, 112.64 single precision (twice the “benchmark option”)
  • 1 chassis in 4U + 2 Sl250s + each with * GPUs would the “benchmark option”
TopicDescription
General 6 CPUs (total 48 cores), 18 GPUs (45,000 cuda cores), 64 gb ram/node, no head node
Head NodeNone
Chassis 2xs6500 Chassis (4U) can each hold 2 half-width SL250s(gen8, 4U) servers, rackmounted, 4x1200W power supplies, 1x4U rack blank
Nodes 3xSL250s(gen8), 3x2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores (total 16 cores/node)), Romley series
3x16x8GB 240-Pin DDR3 1600 MHz (64gb/node, 10+ gb/gpu, max 256gb)
3x2x500GB 7200RPM, 3x6xNVIDIA Tesla K20 5 GB GPUs (5 gpu/node), 1CPU-to-3GPU ratio
3x2x10/100/1000 NIC, Dedicated IPMI Port, 3x8x PCIE 3.0 x16 Slots (GPU), 3x2x PCIE 3.0 x8
3x2xIB interconnect, QDR 40Gb/s, FlexibleLOM goes into PCI3x8 slot
chassis supplied power; 3x1x one PDU power cord (416151-B21)? - see below
Network1xVoltaire QDR 36-port infiniband 40 Gb/s switch, + 6x 5M QSFP IB cables
No ethernet switch, 17x 7' CAT5 RJ45 cables
Powerrack PDU ready, what is 1x HP 40A HV Core Only Corded PDU???
Software RHEL, CMU GPU enabled (1 year support) - not on quote???
Warranty3 Year Parts and Labor (HP technical support?)
GPU Teraflops21.06 double, 63.36 single
Quote<!-- $128,370, for a 1x6500+2xSl250 setup estimate is $95,170 -->Arrived (S&H and insurance?)
  • To compare with “benchmark option” price wise; 37% higher (25% less CPU cores)
  • To compare with “benchmark option” performance; 12.5% higher (double precision peak)
  • When quote is reduced to 1x s6500 chassis and 2x SL250s:
    • To compare with “benchmark option” price wise; 1.6% higher (50% less CPU cores)
    • To compare with “benchmark option” performance; 25% lower (double precision peak)
  • HP on site install
  • we have 9U in HP rack available (1U for new switch)
    • L6-30 7,500 Watts x3 PDUs (non-UPS) = 22,500 Watts - HP cluster 10,600 Watts
    • leaves 11,898 Watts, should be sufficient for 4 SL270s(redundant power supplies)
  • new infiniband switch, isolates GPU cluster traffic from rest of HPC
    • 36 port IB switch overkill
    • still need IB connection greentail to new switch (home dirs IPoIB)
  • 1 TB local storage per node
  • our software install problem, so is the 12.5% worth it? (with 3 trays)

ConfCall & Specs: AX

  • Cluster management is ROCKS (we'll pass)
  • No scheduler (that's OK, we'll use OpenLava)
  • They do not install software, only operating system and
  • CUDA driver setup and installation

AX Specs

http://www.amax.com/hpc/productdetail.asp?product_id=simcluster Fremont, CA

TopicDescription
General 8 CPUs (48 cores), 12 GPUs (30,000 cuda cores), 64 gb ram/node, plus head node
Head Node1x1U Rackmount System, 2x Intel Xeon E5-2620 2.0GHz (12 cores total)
64GB DDR3 1333MHz (max 256gb), 2×10/100/1000 NIC, 2x PCIe x16 Full
2x1TB (Raid 1) 7200RPM, InfiniBand adapter card, Single-Port, QSFP 40Gb/s
???w Power Supply, CentOS
Nodes4x1U, 4x2xIntel Xeon E5-2650 2.0GHz, with 6 cores (12cores/node) Romley series
4x96GB 240-Pin DDR3 1600 MHz (96gb/node memory, 8gb/gpu, max 256gb)
4x1TB 7200RPM, 12xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio
2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
4xInfiniband adapter card, Single-Port, QSFP 40Gb/s
4x??00W Redundant Power Supplies
Network1x Infiniband Switch (18 ports)& HCAs (single port) + ?' cables
1x 1U 24 Port Rackmount Switch, 10/100/1000, Unmanaged (+ ?' cables)
Powerthere are 3 rack PDUs? What are the connectors, L6-30?
Software CUDA only
Warranty3 Year Parts and Labor (AX technical support?)
GPU Teraflops 14.04 double, 42.96 single
Quote<!-- $73,965 (S&H $800 included) -->Arrived
  • 22U cabinet
  • Insurance during shipping is our problem (non-returnable)
  • To compare with “benchmark option” price wise; 21% lower (25% less CPU cores)
  • To compare with “benchmark option” performance; 22% lower (double precision peak)
  • If we go turnkey systems having software installed is huge

ConfCall & Specs: MW

  • sells both individual racks and turn-key systems
    • racks are 4U with 2 CPUs and 8 GPUs, 2200 Watts, K20X GPUs
    • turn-key units are per customer specifications
  • they will install all software components (if license keys are provided)
    • includes CUDA drivers and setup, Amber (pmemd.cuda & pmemd.cuda.MPI, check) and Lammps
    • but also Matlab and Mathematica if needed (wow!)
  • standard 2 year warranty though (no biggie)

MW Specs

http://www.microway.com/tesla/clusters.html Plymouth, MA :!:

TopicDescription
General 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 32 gb ram/node, plus head node
Head Node1x2U Rackmount System, 2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores
8x4GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2×10/100/1000 NIC, 3x PCIe x16 Full, 3x PCIe x8
2x1TB 7200RPM (Raid 1) + 6x2TB (Raid 6), Areca Raid Controller
Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s
740w Power Supply 1+1 redundant
Nodes4x1U Rackmountable Chassis, 4×2 Xeon E5-2650 2.0 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series
4x8x4GB 240-Pin DDR3 1600 MHz (32gb/node memory, 8gb/gpu, max 256gb)
4x1x120GB SSD 7200RPM, 4x4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio
2×10/100/1000 NIC, Dedicated IPMI Port, 4x PCIE 3.0 x16 Slots
4xConnectX-3 VPI adapter card, Single-Port, FDR 56Gb/s
4x1800W (non) Redundant Power Supplies
Network1x Mellanox InfiniBand FDR Switch (36 ports)& HCAs (single port) + 3m cable FDR to existing Voltaire switch
1x 1U 48 Port Rackmount Switch, 10/100/1000, Unmanaged (cables)
Rack
Power2xPDU, Basic rack, 30A, 208V, Requires 1x L6-30 Power Outlet Per PDU (NEMA L6-30P)
Software CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA 5
scheduler and gnu compilers installed and configured
Amber12, Lammps, Barracuda (for weirlab?), and others if desired …bought through MW
Warranty3 Year Parts and Labor (lifetime technical support)
GPU Teraflops18.72 double, 56.32 single
Quote<!-- estimated at $95,800 -->Arrived, includes S&H and Insurance
Upgrades Cluster pre-installation service
5×2 E5-2660 2.20 Ghz 8 core CPUs
5x upgrade to 64 GB per node
  • At full load 5,900 Watts and 20,131 BTUs/hour
  • 2% more expansive than “benchmark option” (as described above with Upgrades), else identical
    • But a new rack (advantageous for data center)
    • With lifetime technical support
    • solid state drives on compute nodes
    • 12 TB local storage (8TB usable)

Then

  • 36 port FDR switch replace with 8 port QDR switch for savings (40 vs 56 Gbps)
    • and all server adapter cards to QDR (with one hook up to existing Voltaire switch)
  • Expand memory footprint
    • Go to 124 GB memory/noe to beef up the CPU HPC side of things
    • 16 cpu cores/nodes minus 4 cpu/gpu cores/node = 12 cpu cores using 104gb which is about 8 GB/cpu core
  • Online testing available (K20, do this)
    • then decide on PGI compiler at purchase time
    • maybe all Lapack libraries too
  • Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)
  • Leave the 6x2TB disk space (for backup)
    • 2U, 8 drives up to 6×4=24 TB, possible?
  • Add an entry level Infiniband/Lustre solution
    • for parallel file locking
  • Spare parts
    • 8 port switch, HCAs and cables, drives …
    • or get 5 years total warranty
  • Testing notes
    • Amber, LAMMPS, NAMD
    • cuda v4&5
    • install/config dirs
    • use gnu … with openmpi
    • make deviceQuery


Back

cluster/110.1361283999.txt.gz · Last modified: 2013/02/19 14:26 by hmeij