User Tools

Site Tools


cluster:166

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:166 [2018/06/18 13:48]
hmeij07
cluster:166 [2018/06/27 07:51] (current)
hmeij07 [Notes]
Line 4: Line 4:
 ==== HPC Users Meeting ==== ==== HPC Users Meeting ====
  
-  * Brief history of HPC+  * Brief history 
     * 2006 swallowtail (Dell PE1955, Infiniband, imw, emw)     * 2006 swallowtail (Dell PE1955, Infiniband, imw, emw)
-    * 2010 greentail (HP gen6 blades, hp12)+    * 2010 greentail (HP gen6 blade servers, hp12)
     * 2013 sharptail (Microway storage, K20s, Infiniband, mw256/mwgpu)      * 2013 sharptail (Microway storage, K20s, Infiniband, mw256/mwgpu) 
     * 2014 mw256fd (Dell 2006 replacement with Supermicro nodes)     * 2014 mw256fd (Dell 2006 replacement with Supermicro nodes)
     * 2015 tinymem (Supermicro bare metal, expansion for serial jobs)     * 2015 tinymem (Supermicro bare metal, expansion for serial jobs)
-    * 2017 mw128 (first faculty startup funds)+    * 2017 mw128 (first new faculty startup funds)
     * 2018 6/25 Today's meeting     * 2018 6/25 Today's meeting
  
Line 18: Line 18:
     * Compute capacity over 60 teraflops (DPFP; 38 cpu side, 25 gpu side)     * Compute capacity over 60 teraflops (DPFP; 38 cpu side, 25 gpu side)
     * Total memory footprint is near 7.5 TB     * Total memory footprint is near 7.5 TB
-    * About 500 accounts have been created (incl 22 collaborators, 100 class accounts)+    * About 500 accounts have been created (incl 22 collaborator and 100 class accounts)
      
   * Funding / charge scheme: is it working for you?   * Funding / charge scheme: is it working for you?
Line 28: Line 28:
    
   * 2017 Benchmarks of some new hardware   * 2017 Benchmarks of some new hardware
-    * Donation led to purchase of four GTX1080ti commercial grade GPUs+    * Donation led to purchase of commercial grade GPU server containing four GTX1080ti GPUs
     * Amber 16. Nucleosome bench runs 4.5x faster than on a K20     * Amber 16. Nucleosome bench runs 4.5x faster than on a K20
     * Gromacs 5.1.4. Colin's multidir bench runs about 2x faster than on a K20     * Gromacs 5.1.4. Colin's multidir bench runs about 2x faster than on a K20
-    * Lammps 11Aug17. Colloid example runs about 11x faster than a K20+    * Lammps 11Aug17. Colloid example runs about 11x faster than on a K20
     * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU vs CPU.     * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU vs CPU.
-    * Price of 128gb node in 2017 $8,250...price of 256gb node in 2018 $10,500+    * Price of 128gb node in 2017 was $8,250...price of 256gb node in 2018 is $10,500
  
-  * 2017 IBM bought Platform Inc (developers of LSF, Openlava is LSF4.2 open source branch) +  * 2016 IBM bought Platform Inc (developers of LSF, Openlava is LSF4.2 open source branch) 
-    * Promptly accused Openlava of copyright infringement in v3.x (US DMCA law, no proof needed).+    * IBM promptly accused Openlava of copyright infringement in v3/v4 (US DMCA law, no proof needed).
     * Fall back option to v2.2 (definitely free of infringement, minor disruption)     * Fall back option to v2.2 (definitely free of infringement, minor disruption)
     * Move forward option, adopt SLURM (LBL developers, major disruption)     * Move forward option, adopt SLURM (LBL developers, major disruption)
 +
 +  * If we adopt SLURM should we transition to OpenHPC Warewulf/SLURM recipe?
 +    * http://openhpc.community/
 +    * new login node and couple compute nodes to start?
  
   * New HPC Advisory Group Member   * New HPC Advisory Group Member
Line 44: Line 48:
   * Tidbits   * Tidbits
     * Bought deep U42 rack with AC cooling onboard and two PDUs     * Bought deep U42 rack with AC cooling onboard and two PDUs
-    * Pushed Angstrom rack (bss24) out of our area, ready to recycle that+    * Pushed Angstrom rack (bss24) out of our area, ready to recycle that (Done. 06/20/2018)
     * Currently we have two U42 racks empty with power      * Currently we have two U42 racks empty with power 
     * Cooling needs to be provided with any new major purchases (provost, ITS, HPC?)     * Cooling needs to be provided with any new major purchases (provost, ITS, HPC?)
Line 51: Line 55:
       * cottontail (03/2019),        * cottontail (03/2019), 
       * ringtail & n78 (10/2020)       * ringtail & n78 (10/2020)
-      * mw128_nodes (06/2020)+      * mw128_nodes & shartptaildr (06/2020)
     * All Infiniband ports are in use     * All Infiniband ports are in use
  
 +===== Notes =====
 +
 +  * First make a page comparing CPU vs GPU usage which may influence future purchase [[cluster:167|CPU vs GPU]]
 +  * $100k quote, 3to5 vendors, data points mid-2018
 +  * One node (or all) should have configured on it: amber, gromacs, laamps, namd, latest version
 +  * Nvidia latest version, optimal configs cpu:gpu ratios
 +    * Amber 1:1 (may be 1:2 in future releases) - amber certified GPU!
 +    * Gromacs 10:1 (could ramp up to claiming all resources per node)
 +    * Namd 13:1 (could ramp up to claiming all resources per node)
 +    * Lammps 2-4:1
 +  * 128g with enough CPU slots to take over ''hp12'': double ten core minimum
 +  * Anticipated target (also to manage heat exchange)
 +    * 2x10 Xeon CPU (~100gb left) with 2xgtx1080ti GPU (25gb memory required)
 +    * as many as fit budget but but no more than 15 rack wise
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/166.1529344121.txt.gz · Last modified: 2018/06/18 13:48 by hmeij07