This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:107 [2013/01/16 14:57] hmeij [ConfCall & Quote: MW] |
cluster:107 [2013/09/11 13:18] (current) hmeij [Nvidia Telsa K20] |
||
---|---|---|---|
Line 63: | Line 63: | ||
* Amber preinstall (there is a charge, it's very complicated, | * Amber preinstall (there is a charge, it's very complicated, | ||
* 3 year warranty | * 3 year warranty | ||
- | * picture [[http://exxactcorp.com/index.php/solution/solu_detail/84]] | + | * picture [[http://www.google.com/imgres? |
* quote coming (see below) | * quote coming (see below) | ||
* can i plug the power cords into HP rack PDU (PDU to PDU? Answer is no, unless you just buy the rack servers.) | * can i plug the power cords into HP rack PDU (PDU to PDU? Answer is no, unless you just buy the rack servers.) | ||
Line 192: | Line 192: | ||
GPU cluster management (includes Platform LSF so identical to Lava which we use)\\ | GPU cluster management (includes Platform LSF so identical to Lava which we use)\\ | ||
[[https:// | [[https:// | ||
+ | |||
+ | Webinars | ||
+ | [[http:// | ||
==== GPU Programming ==== | ==== GPU Programming ==== | ||
Line 223: | Line 226: | ||
... | ... | ||
</ | </ | ||
+ | |||
+ | More examples using the PGC compiler and openACC from Microway | ||
==== Phi Programming ==== | ==== Phi Programming ==== | ||
Line 336: | Line 341: | ||
* Has examples of how to expose GPUs across nodes | * Has examples of how to expose GPUs across nodes | ||
- | ==== ConfCall & Quote: AC ==== | + | ==== GPU Specs ==== |
- | + | ||
- | 09nov12: | + | |
- | + | ||
- | * /home and /apps mounted on CPU side. How does GPU access these? Or is job on CPU responsible for this? | + | |
- | * Single versus double precision? Both needed I assume. | + | |
- | * Unit above is Nvidia " | + | |
- | * Lava compatibility (almost certain but need to check) AC uses SGE. | + | |
- | * We do not really " | + | |
- | * Intel Xeon Phi Co-Processors: | + | |
- | * < | + | |
- | + | ||
- | + | ||
- | **AC Quote** | + | |
- | + | ||
- | * Early 2013 product line up\\ | + | |
- | * [[http:// | + | |
- | * Quote coming for single 4U unit, which could be a one off test unit (compare to HP) | + | |
- | + | ||
- | + | ||
- | ^ Topic^Description | + | |
- | | General| 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node| | + | |
- | | Head Node| None| | + | |
- | | Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/ | + | |
- | | |8x4GB 240-Pin DDR3 1600 MHz memory (32gb/ | + | |
- | | |1x120GB SATA 2.5" Solid State Drive (OS drive), 7x3TB 7200RPM| | + | |
- | | |3xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio| | + | |
- | | |2x10/ | + | |
- | | |1xConnectX-3 VPI adapter card, single-port 56Gb/s| | + | |
- | | |2x1620W Redundant Power Supplies| | + | |
- | | Network|1x36 port Infiniband FDR (56Gb/s) switch & 4xConnectX-3 single port FDR (56Gb/s) IB adapter | + | |
- | | Power| Rack power ready| | + | |
- | | Software| None| | + | |
- | | Warranty|3 Year Parts and Labor (AC technical support)| | + | |
- | | GPU Teraflops| 3.51 double, 10.56 single| | + | |
- | | Quote|< | + | |
- | + | ||
- | * In order to match the " | + | |
- | * 8100 Watts, would still fit power wise but not rack wise (we'd need 20U) | + | |
- | + | ||
- | * Single rack, 21 TB of disk space (Raid 5/6) | + | |
- | * The IB switch (plus 4 spare cards/ | + | |
- | * If we remove it, we need QDR Voltaire compliant HCAs and cables (3 ports free) | + | |
- | * The config does not pack as much teraflops for the dollars; we'll see | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== ConfCall & Quote: EC ==== | + | |
- | + | ||
- | 12nov12: | + | |
- | + | ||
- | * GPU hardware only | + | |
- | * scheduler never sees gpus just cpus | + | |
- | * cpu to gpu is one-to-one when using westmere chips | + | |
- | * bright cluster management (image based) - we can front end with lava | + | |
- | * what's the memory connection cpu/ | + | |
- | * home dirs - cascade via voltaire 4036, need to make sure this is compatible! | + | |
- | * software on local disk? home dirs via infiniband ipoib, yes, but self install | + | |
- | * amber (charge for this) and lammps preinstalled - must be no problem, will be confirmed | + | |
- | * 2 K20 per 2 CPUs per rack 900-1000W, 1200 W power supply on each node | + | |
- | * PDU on simcluster, each node has power connection | + | |
- | * quote coming for 4 node simcluster | + | |
- | * testing periods can be staged so you are testing exactly what we're buying if simcluster if within budget (see K20 above) | + | |
- | + | ||
- | + | ||
- | **EC Quote** | + | |
- | + | ||
- | + | ||
- | * [[http:// | + | |
- | + | ||
- | ^ Topic^Description | + | |
- | | General| 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 64 gb ram/node, plus head node| | + | |
- | | Head Node|1x1U Rackmount System, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores| | + | |
- | | |8x8GB 240-Pin DDR3 1600 MHz ECC (max 256gb), 2x10/ | + | |
- | | |2x2TB 7200RPM (can hold 10), ConnectX-2 VPI adapter card, Single-Port, | + | |
- | | |600w Power Supply| | + | |
- | | Nodes|4x2U Rackmountable Chassis, 8xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16/node), Romley series| | + | |
- | | |32x8GB 240-Pin DDR3 1600 MHz (64gb/node memory, 16gb/gpu, max 256gb)| | + | |
- | | |4x1TB 7200RPM, 16xNVIDIA Tesla K20 8 GB GPUs (4/node), 1CPU-2GPU ratio| | + | |
- | | |2x10/ | + | |
- | | |4xConnectX-2 VPI adapter card, Single-Port, | + | |
- | | |4x1800W Redundant Power Supplies| | + | |
- | | Network|1x Mellanox InfiniBand QDR Switch (8 ports)& HCAs (single port) + 7' cables| | + | |
- | | |1x 1U 16 Port Rackmount Switch, 10/ | + | |
- | | Power|2xPDU, | + | |
- | | Software| CentOS, Bright Cluster Management (1 year support)| | + | |
- | | | Amber12 (cluster install), Lammps (shared filesystem), | + | |
- | | Warranty|3 Year Parts and Labor (EC technical support?)| | + | |
- | | GPU Teraflops|18.72 double, 56.32 single| | + | |
- | | Quote|< | + | |
- | + | ||
- | * Lets make this the " | + | |
- | * In order to match this with Xeon Phis we'd need 18 of them (probably 5 4U trays) | + | |
- | + | ||
- | * This is the (newest) simcluster design (that can be tested starting Jan 2013) | + | |
- | * 24U cabinet | + | |
- | * We could deprecate 50% of bss24 queue freeing two L6-30 connectors | + | |
- | * Spare parts: | + | |
- | * Add another HCA card to greentail and connect to Mellanox switch (long cable) | + | |
- | * also isolates GPU traffic from other clusters | + | |
- | * 1 8-port switch, 4 HCA cards, 4 long cables (for petal/ | + | |
- | * New head node | + | |
- | * First let EC install Bright/ | + | |
- | * 16 GPUs implies 16x2,500 or 40,000 cuda cores (625 per job slot on average) | + | |
- | * Use as standalone cluster or move GPU queue to greentail | + | |
- | * If so, turn this head node into a 16 job slot ram heavy compute node? | + | |
- | * 256-512gb (Order?) | + | |
- | * add local storage? (up to 10 1or2 TB disks) | + | |
- | * Compute nodes | + | |
- | * add local storage? (up to 10 1or2 TB disks) | + | |
- | * Bright supports openlava and GPU monitoring (get installed) | + | |
- | * [[http:// | + | |
- | * [[http:// | + | |
- | * EC software install | + | |
- | * sander, sander.MPI, pmemd, pmemd.cuda (single GPU version), pmemd.cuda.MPI (the multi-GPU version) | + | |
- | * NVIDIA Toolkit v4.2. Please note that v5.0 is NOT currently supported | + | |
- | * MVAPICH2 V1.8 or later / MPICH2 v1.4p1 or later recommended, | + | |
- | * make sure they do not clean source, analyze how they compiled | + | |
- | * which compiler will they use? which MPI < | + | |
- | + | ||
- | ==== ConfCall & Quote: HP ==== | + | |
- | + | ||
- | HP 19nov12: meeting notes | + | |
- | + | ||
- | * HP ProLiant SL270s Generation 8 (Gen8); 4U half width with 2 CPUs + 8 (max) GPUs | + | |
- | * The s6500 Chassis is 4U tray holding two S270s servers | + | |
- | * max 8 GPUs (20,000 cuda cores) + 2 CPUs (total 16 cores), dual drives, 256gb max | + | |
- | * K20 availability will be confirmed by Charlie | + | |
- | * power | + | |
- | * Charlie will crunch numbers of existing HPC and assess if we can use the current rack | + | |
- | * otherwise a stand alone half rack solution | + | |
- | * < | + | |
- | * connect greentail with additional HCA card, or voltaire to voltaire? | + | |
- | * our software compilation problem, huge | + | |
- | * but they have great connections with Nvidia for compilation help (how to qualify that?) | + | |
- | * CMU for GPU monitoring, 3-rendering of what GPU is doing | + | |
- | * This SL270s can also support up to 8 Xeon Phi coprocessors | + | |
- | * but expect very lengthy delays, Intel is not ready for delivery (1 Phi = 1 double teraflop) | + | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | **HP Quote** | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | * < | + | |
- | * This hardware can be tested at ExxactCorp so single tray purchase for testing not a requirement | + | |
- | + | ||
- | * 2 chassis in 8U + 4 SL250s + each with 8 GPUs would be a massive GPU cruncher | + | |
- | * 8 CPUs, 32 GPUs = 64 cpu cores and 80,000 cuda cores (avg 1, | + | |
- | * peak performance: | + | |
- | * 1 chassis in 4U + 2 Sl250s + each with * GPUs would the " | + | |
- | + | ||
- | ^ Topic^Description | + | |
- | | General| 6 CPUs (total 48 cores), 18 GPUs (45,000 cuda cores), 64 gb ram/node, no head node| | + | |
- | | Head Node|None| | + | |
- | | Chassis| 2xs6500 Chassis (4U) can each hold 2 half-width SL250s(gen8, | + | |
- | | Nodes| 3xSL250s(gen8), | + | |
- | | |3x16x8GB 240-Pin DDR3 1600 MHz (64gb/node, 10+ gb/gpu, max 256gb)| | + | |
- | | |3x2x500GB 7200RPM, 3x6xNVIDIA Tesla K20 5 GB GPUs (5 gpu/node), 1CPU-to-3GPU ratio| | + | |
- | | |3x2x10/ | + | |
- | | |3x2xIB interconnect, | + | |
- | | |chassis supplied power; 3x1x one PDU power cord (416151-B21)? | + | |
- | | Network|1xVoltaire QDR 36-port infiniband 40 Gb/s switch, + 6x 5M QSFP IB cables| | + | |
- | | |No ethernet switch, 17x 7' CAT5 RJ45 cables| | + | |
- | | Power|rack PDU ready, what is 1x HP 40A HV Core Only Corded PDU???| | + | |
- | | Software| RHEL, CMU GPU enabled (1 year support) - not on quote???| | + | |
- | | Warranty|3 Year Parts and Labor (HP technical support?)| | + | |
- | | GPU Teraflops|21.06 double, 63.36 single| | + | |
- | | Quote|< | + | |
- | + | ||
- | + | ||
- | * To compare with “benchmark option” price wise; 37% higher (25% less CPU cores) | + | |
- | * To compare with “benchmark option” performance; | + | |
- | + | ||
- | * When quote is reduced to 1x s6500 chassis and 2x SL250s: | + | |
- | * To compare with “benchmark option” price wise; 1.6% higher (50% less CPU cores) | + | |
- | * To compare with “benchmark option” performance; | + | |
- | + | ||
- | * HP on site install | + | |
- | * we have 9U in HP rack available (1U for new switch) | + | |
- | * L6-30 7,500 Watts x3 PDUs (non-UPS) = 22,500 Watts - HP cluster 10,600 Watts | + | |
- | * leaves 11,898 Watts, should be sufficient for 4 SL270s(redundant power supplies) | + | |
- | * new infiniband switch, isolates GPU cluster traffic from rest of HPC | + | |
- | * 36 port IB switch overkill | + | |
- | * still need IB connection greentail to new switch (home dirs IPoIB) | + | |
- | * 1 TB local storage per node | + | |
- | * our software install problem, so is the 12.5% worth it? (with 3 trays) | + | |
- | + | ||
- | + | ||
- | ==== ConfCall & Quote: AX ==== | + | |
- | + | ||
- | * Cluster management is ROCKS (we'll pass) | + | |
- | * No scheduler (that' | + | |
- | * They do **not** install software, only operating system and | + | |
- | * CUDA driver setup and installation | + | |
- | + | ||
- | **AX Quote** | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | ^ Topic^Description | + | |
- | | General| 8 CPUs (48 cores), 12 GPUs (30,000 cuda cores), 64 gb ram/node, plus head node| | + | |
- | | Head Node|1x1U Rackmount System, 2x Intel Xeon E5-2620 2.0GHz (12 cores total)| | + | |
- | | |64GB DDR3 1333MHz (max 256gb), 2x10/ | + | |
- | | |2x1TB (Raid 1) 7200RPM, InfiniBand | + | |
- | | |???w Power Supply, CentOS| | + | |
- | | Nodes|4x1U, 4x2xIntel Xeon E5-2650 2.0GHz, with 6 cores (12cores/ | + | |
- | | |4x96GB 240-Pin DDR3 1600 MHz (96gb/node memory, 8gb/gpu, max 256gb)| | + | |
- | | |4x1TB 7200RPM, 12xNVIDIA Tesla K20 8 GB GPUs (3/node), 1CPU-1.5GPU ratio| | + | |
- | | |2x10/ | + | |
- | | |4xInfiniband adapter card, Single-Port, | + | |
- | | |4x??00W Redundant Power Supplies| | + | |
- | | Network|1x Infiniband Switch (18 ports)& HCAs (single port) + ?' cables| | + | |
- | | |1x 1U 24 Port Rackmount Switch, 10/ | + | |
- | | Power|there are 3 rack PDUs? What are the connectors, L6-30? | | + | |
- | | Software| CUDA only| | + | |
- | | Warranty|3 Year Parts and Labor (AX technical support?)| | + | |
- | | GPU Teraflops| 14.04 double, | + | |
- | | Quote|< | + | |
- | + | ||
- | * 22U cabinet | + | |
- | * Insurance during shipping is our problem (non-returnable) | + | |
- | * To compare with " | + | |
- | * To compare with " | + | |
- | * If we go turnkey systems having software installed is huge | + | |
- | + | ||
- | ==== ConfCall & Quote: MW ==== | + | |
- | + | ||
- | * sells both individual racks and turn-key systems | + | |
- | * racks are 4U with 2 CPUs and 8 GPUs, 2200 Watts, K20X GPUs | + | |
- | * turn-key units are per customer specifications | + | |
- | * they will install **all** software components (if license keys are provided) | + | |
- | * includes CUDA drivers and setup, Amber (pmemd.cuda & pmemd.cuda.MPI, | + | |
- | * but also Matlab and Mathematica if needed (wow!) | + | |
- | * standard 2 year warranty though (no biggie) | + | |
- | + | ||
- | + | ||
- | **MW Quote** | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | + | ||
- | ^ Topic^Description | + | |
- | | General| 8 CPUs (64 cores), 16 GPUs (40,000 cuda cores), 32 gb ram/node, plus head node| | + | |
- | | Head Node|1x2U Rackmount System, 2xXeon E5-2650 2.0 Ghz 20MB Cache 8 cores| | + | |
- | | |8x4GB 240-Pin DDR3 1600 MHz ECC (max 512gb), 2x10/ | + | |
- | | |2x1TB 7200RPM (Raid 1) + 6x2TB (Raid 6), Areca Raid Controller| | + | |
- | | |Low profile graphics card, ConnectX-3 VPI adapter card, Single-Port, | + | |
- | | |740w Power Supply 1+1 redundant| | + | |
- | | Nodes|4x1U Rackmountable Chassis, 4x2 Xeon E5-2650 2.0 Ghz 20MB Cache 8 cores (16/node), Sandy Bridge series| | + | |
- | | |4x8x4GB 240-Pin DDR3 1600 MHz (32gb/node memory, 8gb/gpu, max 256gb)| | + | |
- | | |4x1x120GB SSD 7200RPM, 4x4xNVIDIA Tesla K20 5 GB GPUs (4/node), 1CPU-2GPU ratio| | + | |
- | | |2x10/ | + | |
- | | |4xConnectX-3 VPI adapter card, Single-Port, | + | |
- | | |4x1800W (non) Redundant Power Supplies| | + | |
- | | Network|1x Mellanox InfiniBand FDR Switch (36 ports)& HCAs (single port) + 3m cable FDR to existing Voltaire switch| | + | |
- | | |1x 1U 48 Port Rackmount Switch, 10/ | + | |
- | |Rack |1x42U rack with power distribution | + | |
- | | Power|2xPDU, | + | |
- | | Software| CentOS, Bright Cluster Management (1 year support), MVAPich, OpenMPI, CUDA 5| | + | |
- | | | scheduler and gnu compilers installed and configured| | + | |
- | | | Amber12, Lammps, Barracuda (for weirlab?), and others if desired ...bought through MW| | + | |
- | | Warranty|3 Year Parts and Labor (lifetime technical support)| | + | |
- | | GPU Teraflops|18.72 double, 56.32 single| | + | |
- | | Quote|< | + | |
- | |Upgrades | + | |
- | | | 5x2 E5-2660 2.20 Ghz 8 core CPUs| | + | |
- | | | 5x upgrade to 64 GB per node| | + | |
- | + | ||
- | * At full load 5,900 Watts and 20,131 BTUs/hour | + | |
- | + | ||
- | * 2% more expansive than " | + | |
- | * But a new rack (advantageous for data center) | + | |
- | * With lifetime technical support | + | |
- | * solid state drives on compute nodes | + | |
- | * 12 TB local storage | + | |
- | Then | ||
- | | + | **[[cluster: |
- | | + | |
- | * Expand memory footprint | + | |
- | * Go to 124 GB memory/noe to beef up the CPU HPC side of things | + | |
- | * 16 cpu cores/nodes minus 4 cpu/gpu cores/node = 12 cpu cores using 104gb which is about 8 GB/cpu core | + | |
- | * Online testing available (K20, do this) | + | |
- | * then decide on PGI compiler at purchase time | + | |
- | * maybe all Lapack libraries too | + | |
- | * Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?) | + | |
- | * Leave the 6x2TB disk space (for backup) | + | |
- | * 2U, 8 drives up to 6x4=24 TB, possible? | + | |
- | | + | |
- | | + | |
- | * Spare parts | ||
- | * 8 port switch, HCAs and cables, drives ... | ||
- | * or get 5 years total warranty | ||
- | * Amber, LAMMPS, NAMD | ||
- | * cuda v4&5 | ||
- | * install/ | ||
- | * | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |