cluster:166
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:166 [2018/06/18 14:28] – [HPC Users Meeting] hmeij07 | cluster:166 [2018/06/27 11:51] (current) – [Notes] hmeij07 | ||
|---|---|---|---|
| Line 4: | Line 4: | ||
| ==== HPC Users Meeting ==== | ==== HPC Users Meeting ==== | ||
| - | * Brief history | + | * Brief history |
| - | * 2006 swallowtail (Dell PE1955, | + | * 2006 swallowtail (Dell PE1955, |
| - | * 2010 greentail (HP gen6 blades, hp12) | + | * 2010 greentail (HP gen6 blade servers, hp12) |
| - | * 2013 sharptail (storage, K20s, Infiniband, mw256/mwgpu)) | + | * 2013 sharptail (Microway |
| - | * 2014 mw256fd (Dell replacement) | + | * 2014 mw256fd (Dell 2006 replacement |
| - | * 2015 tinymem (serial job expansion) | + | * 2015 tinymem (Supermicro bare metal, |
| - | * 2017 mw128 (first faculty startup funds) | + | * 2017 mw128 (first |
| - | * 2018 Todays | + | * 2018 6/25 Today' |
| - | * Since 2006 | + | |
| - | * Grown from 256 to roughly 1,100 physical CPU cores | + | * Grown from 256 to roughly 1,200 physical CPU cores |
| - | * Processed 3,165,752 jobs (18jun18) | + | * Processed 3,165,752 jobs (by 18jun2018) |
| - | * Compute capacity | + | * Compute capacity |
| - | * About 500 accounts created (incl 22 collaborators, | + | * Total memory footprint is near 7.5 TB |
| + | * About 500 accounts | ||
| | | ||
| * Funding / charge scheme: is it working for you? | * Funding / charge scheme: is it working for you? | ||
| - | * Last 2 years $15K target realized | + | * Last 2 years, $15K target realized |
| * Status of our cluster development fund | * Status of our cluster development fund | ||
| Line 27: | Line 28: | ||
| * 2017 Benchmarks of some new hardware | * 2017 Benchmarks of some new hardware | ||
| - | * Donation led to purchase of four GTX1080i | + | * Donation led to purchase of a commercial grade GPU server containing four GTX1080ti |
| * Amber 16. Nucleosome bench runs 4.5x faster than on a K20 | * Amber 16. Nucleosome bench runs 4.5x faster than on a K20 | ||
| * Gromacs 5.1.4. Colin' | * Gromacs 5.1.4. Colin' | ||
| - | * Lammps 11Aug17. Colloid example runs about 11x faster than K20 | + | * Lammps 11Aug17. Colloid example runs about 11x faster than on a K20 |
| - | * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU. | + | * FSL 5.0.10. BFT bedpostx tests run 16x faster on CPU, a whopping 118x faster on GPU vs CPU. |
| + | * Price of 128gb node in 2017 was $8, | ||
| - | * 2017 IBM purchased | + | * 2016 IBM bought |
| - | * Promptly | + | * IBM promptly |
| * Fall back option to v2.2 (definitely free of infringement, | * Fall back option to v2.2 (definitely free of infringement, | ||
| - | * Fall forward option, adopt SLURM (major disruption) | + | * Move forward option, adopt SLURM (LBL developers, |
| + | * If we adopt SLURM should we transition to OpenHPC Warewulf/ | ||
| + | * http:// | ||
| + | * new login node and couple compute nodes to start? | ||
| + | |||
| + | * New HPC Advisory Group Member | ||
| + | |||
| + | * Tidbits | ||
| + | * Bought deep U42 rack with AC cooling onboard and two PDUs | ||
| + | * Pushed Angstrom rack (bss24) out of our area, ready to recycle that (Done. 06/20/2018) | ||
| + | * Currently we have two U42 racks empty with power | ||
| + | * Cooling needs to be provided with any new major purchases (provost, ITS, HPC?) | ||
| + | * 60 TB raw storage purchased for sharptail (/home2 for users with specific needs) | ||
| + | * Everything is out of warranty but | ||
| + | * cottontail (03/ | ||
| + | * ringtail & n78 (10/2020) | ||
| + | * mw128_nodes & shartptaildr (06/2020) | ||
| + | * All Infiniband ports are in use | ||
| + | |||
| + | ===== Notes ===== | ||
| + | |||
| + | * First make a page comparing CPU vs GPU usage which may influence future purchase [[cluster: | ||
| + | * $100k quote, 3to5 vendors, data points mid-2018 | ||
| + | * One node (or all) should have configured on it: amber, gromacs, laamps, namd, latest version | ||
| + | * Nvidia latest version, optimal configs cpu:gpu ratios | ||
| + | * Amber 1:1 (may be 1:2 in future releases) - amber certified GPU! | ||
| + | * Gromacs 10:1 (could ramp up to claiming all resources per node) | ||
| + | * Namd 13:1 (could ramp up to claiming all resources per node) | ||
| + | * Lammps 2-4:1 | ||
| + | * 128g with enough CPU slots to take over '' | ||
| + | * Anticipated target (also to manage heat exchange) | ||
| + | * 2x10 Xeon CPU (~100gb left) with 2xgtx1080ti GPU (25gb memory required) | ||
| + | * as many as fit budget but but no more than 15 rack wise | ||
| \\ | \\ | ||
| **[[cluster: | **[[cluster: | ||
cluster/166.1529332094.txt.gz · Last modified: by hmeij07
