User Tools

Site Tools


cluster:166

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:166 [2018/06/18 12:11]
hmeij07 [HPC Users Meeting]
cluster:166 [2018/06/27 11:51] (current)
hmeij07 [Notes]
Line 4: Line 4:
 ==== HPC Users Meeting ==== ==== HPC Users Meeting ====
  
-  * Brief history of HPC +  * Brief history  
-    * Formation 2006/12/13 +    * 2006 swallowtail (Dell PE1955, Infiniband, imw, emw) 
-    * First Purchase +    * 2010 greentail (HP gen6 blade servers, hp12) 
 +    * 2013 sharptail (Microway storage, K20s, Infiniband, mw256/mwgpu)  
 +    * 2014 mw256fd (Dell 2006 replacement with Supermicro nodes) 
 +    * 2015 tinymem (Supermicro bare metal, expansion for serial jobs) 
 +    * 2017 mw128 (first new faculty startup funds) 
 +    * 2018 6/25 Today's meeting 
 + 
 +  * Since 2006 
 +    * Grown from 256 to roughly 1,200 physical CPU cores 
 +    * Processed 3,165,752 jobs (by 18jun2018) 
 +    * Compute capacity over 60 teraflops (DPFP; 38 cpu side, 25 gpu side) 
 +    * Total memory footprint is near 7.5 TB 
 +    * About 500 accounts have been created (incl 22 collaborator and 100 class accounts) 
 +   
 +  * Funding / charge scheme: is it working for you? 
 +      * Last 2 years, $15K target realized each year. 
 + 
 +  * Status of our cluster development fund 
 +    * $140K come July 1st, 2018 
 +    * Time for some new hardware?  Retirement of hp12 nodes? 
 +  
 +  * 2017 Benchmarks of some new hardware 
 +    * Donation led to purchase of a commercial grade GPU server containing four GTX1080ti GPUs 
 +    * Amber 16. Nucleosome bench runs 4.5x faster than on a K20 
 +    * Gromacs 5.1.4. Colin's multidir bench runs about 2x faster than on a K20 
 +    * Lammps 11Aug17. Colloid example runs about 11x faster than on a K20 
 +    * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU vs CPU. 
 +    * Price of 128gb node in 2017 was $8,250...price of 256gb node in 2018 is $10,500 
 + 
 +  * 2016 IBM bought Platform Inc (developers of LSF, Openlava is LSF4.2 open source branch) 
 +    * IBM promptly accused Openlava of copyright infringement in v3/v4 (US DMCA law, no proof needed). 
 +    * Fall back option to v2.2 (definitely free of infringement, minor disruption) 
 +    * Move forward option, adopt SLURM (LBL developers, major disruption) 
 + 
 +  * If we adopt SLURM should we transition to OpenHPC Warewulf/SLURM recipe? 
 +    * http://openhpc.community/ 
 +    * new login node and couple compute nodes to start? 
 + 
 +  * New HPC Advisory Group Member 
 + 
 +  * Tidbits 
 +    * Bought deep U42 rack with AC cooling onboard and two PDUs 
 +    * Pushed Angstrom rack (bss24) out of our area, ready to recycle that (Done. 06/20/2018) 
 +    * Currently we have two U42 racks empty with power  
 +    * Cooling needs to be provided with any new major purchases (provost, ITS, HPC?) 
 +    * 60 TB raw storage purchased for sharptail (/home2 for users with specific needs) 
 +    * Everything is out of warranty but 
 +      * cottontail (03/2019),  
 +      * ringtail & n78 (10/2020) 
 +      * mw128_nodes & shartptaildr (06/2020) 
 +    * All Infiniband ports are in use 
 + 
 +===== Notes ===== 
 + 
 +  * First make a page comparing CPU vs GPU usage which may influence future purchase [[cluster:167|CPU vs GPU]] 
 +  * $100k quote, 3to5 vendors, data points mid-2018 
 +  * One node (or all) should have configured on it: amber, gromacs, laamps, namd, latest version 
 +  * Nvidia latest version, optimal configs cpu:gpu ratios 
 +    * Amber 1:1 (may be 1:2 in future releases) - amber certified GPU! 
 +    * Gromacs 10:1 (could ramp up to claiming all resources per node) 
 +    * Namd 13:1 (could ramp up to claiming all resources per node) 
 +    * Lammps 2-4:1 
 +  * 128g with enough CPU slots to take over ''hp12'': double ten core minimum 
 +  * Anticipated target (also to manage heat exchange) 
 +    * 2x10 Xeon CPU (~100gb left) with 2xgtx1080ti GPU (25gb memory required) 
 +    * as many as fit budget but but no more than 15 rack wise
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/166.1529323875.txt.gz · Last modified: 2018/06/18 12:11 by hmeij07