User Tools

Site Tools


cluster:166

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:166 [2018/06/18 14:53]
hmeij07
cluster:166 [2018/06/27 07:51]
hmeij07 [Notes]
Line 4: Line 4:
 ==== HPC Users Meeting ==== ==== HPC Users Meeting ====
  
-  * Brief history of HPC+  * Brief history 
     * 2006 swallowtail (Dell PE1955, Infiniband, imw, emw)     * 2006 swallowtail (Dell PE1955, Infiniband, imw, emw)
     * 2010 greentail (HP gen6 blade servers, hp12)     * 2010 greentail (HP gen6 blade servers, hp12)
Line 28: Line 28:
    
   * 2017 Benchmarks of some new hardware   * 2017 Benchmarks of some new hardware
-    * Donation led to purchase of four GTX1080ti commercial grade GPU server+    * Donation led to purchase of commercial grade GPU server containing four GTX1080ti GPUs
     * Amber 16. Nucleosome bench runs 4.5x faster than on a K20     * Amber 16. Nucleosome bench runs 4.5x faster than on a K20
     * Gromacs 5.1.4. Colin's multidir bench runs about 2x faster than on a K20     * Gromacs 5.1.4. Colin's multidir bench runs about 2x faster than on a K20
-    * Lammps 11Aug17. Colloid example runs about 11x faster than a K20+    * Lammps 11Aug17. Colloid example runs about 11x faster than on a K20
     * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU vs CPU.     * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU vs CPU.
-    * Price of 128gb node in 2017 $8,250...price of 256gb node in 2018 $10,500+    * Price of 128gb node in 2017 was $8,250...price of 256gb node in 2018 is $10,500
  
-  * 2017 IBM bought Platform Inc (developers of LSF, Openlava is LSF4.2 open source branch) +  * 2016 IBM bought Platform Inc (developers of LSF, Openlava is LSF4.2 open source branch) 
-    * Promptly accused Openlava of copyright infringement in v3.x (US DMCA law, no proof needed).+    * IBM promptly accused Openlava of copyright infringement in v3/v4 (US DMCA law, no proof needed).
     * Fall back option to v2.2 (definitely free of infringement, minor disruption)     * Fall back option to v2.2 (definitely free of infringement, minor disruption)
     * Move forward option, adopt SLURM (LBL developers, major disruption)     * Move forward option, adopt SLURM (LBL developers, major disruption)
 +
 +  * If we adopt SLURM should we transition to OpenHPC Warewulf/SLURM recipe?
 +    * http://openhpc.community/
 +    * new login node and couple compute nodes to start?
  
   * New HPC Advisory Group Member   * New HPC Advisory Group Member
Line 44: Line 48:
   * Tidbits   * Tidbits
     * Bought deep U42 rack with AC cooling onboard and two PDUs     * Bought deep U42 rack with AC cooling onboard and two PDUs
-    * Pushed Angstrom rack (bss24) out of our area, ready to recycle that+    * Pushed Angstrom rack (bss24) out of our area, ready to recycle that (Done. 06/20/2018)
     * Currently we have two U42 racks empty with power      * Currently we have two U42 racks empty with power 
     * Cooling needs to be provided with any new major purchases (provost, ITS, HPC?)     * Cooling needs to be provided with any new major purchases (provost, ITS, HPC?)
Line 51: Line 55:
       * cottontail (03/2019),        * cottontail (03/2019), 
       * ringtail & n78 (10/2020)       * ringtail & n78 (10/2020)
-      * mw128_nodes (06/2020)+      * mw128_nodes & shartptaildr (06/2020)
     * All Infiniband ports are in use     * All Infiniband ports are in use
  
 +===== Notes =====
 +
 +  * First make a page comparing CPU vs GPU usage which may influence future purchase [[cluster:167|CPU vs GPU]]
 +  * $100k quote, 3to5 vendors, data points mid-2018
 +  * One node (or all) should have configured on it: amber, gromacs, laamps, namd, latest version
 +  * Nvidia latest version, optimal configs cpu:gpu ratios
 +    * Amber 1:1 (may be 1:2 in future releases) - amber certified GPU!
 +    * Gromacs 10:1 (could ramp up to claiming all resources per node)
 +    * Namd 13:1 (could ramp up to claiming all resources per node)
 +    * Lammps 2-4:1
 +  * 128g with enough CPU slots to take over ''hp12'': double ten core minimum
 +  * Anticipated target (also to manage heat exchange)
 +    * 2x10 Xeon CPU (~100gb left) with 2xgtx1080ti GPU (25gb memory required)
 +    * as many as fit budget but but no more than 15 rack wise
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/166.txt · Last modified: 2018/06/27 07:51 by hmeij07