This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:126 [2017/07/12 14:32] hmeij07 |
cluster:126 [2023/10/23 19:37] (current) hmeij07 |
||
---|---|---|---|
Line 8: | Line 8: | ||
===== Description ===== | ===== Description ===== | ||
- | The High Performance Compute Cluster (HPCC) is comprised of several login nodes (all are on our domain | + | The High Performance Compute Cluster (HPCC) is comprised of several login nodes (all are on our internal network (vlan 52) |
- | * primary login node '' | + | * server |
- | * secondary | + | * primary |
- | * secondary login node '' | + | * zenoss monitoring and alerting server |
- | * (old login node) '' | + | * secondary |
- | * (old login node) '' | + | * server |
- | * (old node) '' | + | * server |
- | * (not to be used as) login node '' | + | * storage servers |
- | * (to be populated summer 2017) '' | + | * storage servers |
- | * Storage | + | * storage |
+ | * storage server '' | ||
- | Several types of compute nodes are available via the OpenLava | + | Several types of compute nodes are available via the scheduler: |
- | * All are running | + | * All are running |
- | * All are on private networks (no internet) | + | * All are x86_64, Intel Xeon chips from 2006 onwards |
- | * All mount /home (10TB, to be expanded to 25TB fall 2017) and /sanscratch (33TB) | + | * All are on private networks (192.168.x.x and/or 10.10.x.x, |
+ | * All mount /zfshomes | ||
* All have local disks providing varying amounts of / | * All have local disks providing varying amounts of / | ||
- | * Hyperthreading is on but only on newer hardware are 50% of logical cores allocated. | + | * Hyperthreading is on but only 50% of logical cores allocated |
Compute node categories which usually align with queues: | Compute node categories which usually align with queues: | ||
Line 32: | Line 34: | ||
* 32 nodes with dual quad core chips (Xeon 5620, 2.4 Ghz) in HP blade 4U enclosures (SL2x170z G6) with memory footprint of 12 GB each (384 GB). This cluster has a compute capacity of 1.5 teraflops (measured using Linpack). Known as the HP cluster, or the nodes n1-n32, queue hp12, 256 job slots. | * 32 nodes with dual quad core chips (Xeon 5620, 2.4 Ghz) in HP blade 4U enclosures (SL2x170z G6) with memory footprint of 12 GB each (384 GB). This cluster has a compute capacity of 1.5 teraflops (measured using Linpack). Known as the HP cluster, or the nodes n1-n32, queue hp12, 256 job slots. | ||
- | | + | * 5 nodes with dual eight core chips (Xeon E5-2660, 2.2 Ghz) in ASUS 2U rack servers with a memory footprint of 256 GB each (1,280 GB). Nodes also contain four K20 Telsa GPU each, 2,500 cores/gpu (10,000 gpu cores per node) with GPU memory footprint of 5 GB (20 GB). This cluster has a compute capacity of 23.40 teraflops double |
- | + | ||
- | | + | |
* 8 nodes with dual eight core chips (Xeon E5-2660, 2.2 Ghz) in Supermicro 1U rack servers with a memory footprint of 256 GB each (2,048 GB). This cluster has a compute capacity of 5.3 teraflops (estimated). Known as the Microway CPU cluster, or nodes n38-n45, queue mw256fd, 192 job slots. | * 8 nodes with dual eight core chips (Xeon E5-2660, 2.2 Ghz) in Supermicro 1U rack servers with a memory footprint of 256 GB each (2,048 GB). This cluster has a compute capacity of 5.3 teraflops (estimated). Known as the Microway CPU cluster, or nodes n38-n45, queue mw256fd, 192 job slots. | ||
Line 42: | Line 42: | ||
* 18 nodes with dual twelve core chips (Xeon E5-2650 v4, 2.2 Ghz) in Supermicro 1U rack servers with a memory footprint of 128 GB each (2,304 GB). This cluster has a compute capacity of 14.3 teraflops (estimated). Known as the Microway " | * 18 nodes with dual twelve core chips (Xeon E5-2650 v4, 2.2 Ghz) in Supermicro 1U rack servers with a memory footprint of 128 GB each (2,304 GB). This cluster has a compute capacity of 14.3 teraflops (estimated). Known as the Microway " | ||
- | All queues are available for job submissions via all login nodes. All nodes on Infiniband switches for parallel computational jobs (excludes bss24, tinymem and mw128 queues). | + | * 1 node with dual eight core chips (Xeon E5-2620 v4, 2.10 Ghz) in Supermicro 1U rack server with a memory footprint |
- | Home directory file system are provided | + | * 12 nodes with dual twelve core chips (Xeon Silver 4214, 2.20 Ghz) in ASUS ESC4000G4 2U rack servers with a memory footprint of 96 GB ( 1,152 GB, about 20 teraflops |
- | A subset of 25 nodes of the Blue Sky Studio cluster listed above also runs our test Hadoop cluster. | + | * 2 nodes with dual twelve core chips (Xeon 4214R “Cascade Lake Refresh” 2.4 GHz), Supermicro 1U servers with a memory footprint |
- | Two Rstore storage | + | * 6 nodes with dual 28 core chips (Xeon Gold 'Ice Lake-SP' |
+ | |||
+ | All queues are available for job submissions via all login nodes. Some nodes on Infiniband switches for parallel computational jobs (queues: mw256fd, hp12, mw256). | ||
+ | |||
+ | Home directory file system are provided (via NFS or IPoIB) by the node '' | ||
===== Our Queues ===== | ===== Our Queues ===== | ||
- | Commercial software has their own queue limited by available licenses. | + | There are no scheduler |
^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^ | ^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^ | ||
- | | matlab | + | | stata | // |
- | | stata | // | + | |
- | | mathematica | + | |
+ | Note: Matlab and Mathematica now have " | ||
- | ^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^ | + | |
- | | hp12 | | + | ^Queue^Nr Of Nodes^Total GB Mem Per Node^Job SLots In Queue^Switch^Hosts^Notes^ |
- | | bss24 | 42 | 24 | | + | | hp12 | |
- | | mw256 | | + | | mwgpu | |
- | | mwgpu | | + | | mw256fd |
- | | mw256fd | + | |
| tinymem | | tinymem | ||
- | | mw128 | | + | | mw128 | |
+ | | amber128 | ||
+ | | exx96 | | ||
+ | | test | | ||
+ | | mw256 | | ||
Some guidelines for appropriate queue usage with detailed page links: | Some guidelines for appropriate queue usage with detailed page links: | ||
Line 74: | Line 80: | ||
* hp12 is the default queue | * hp12 is the default queue | ||
* for processing lots of small to medium memory footprint jobs | * for processing lots of small to medium memory footprint jobs | ||
- | | + | * mwgpu is for GPU (K20) enabled software primarily (Amber, Lammps, NAMD, Gromacs, Matlab, Mathematica) |
- | * when not in use powered off, email me (hmeij@wes) or PEND jobs (hpcadmin will get notified) | + | * be sure to reserve one or more job slot for each GPU used [[cluster:192|EXX96]] |
- | * also our Hadoop cluster [[cluster: | + | * be sure to use the correct wrapper script to set up mpirun from mvapich2, mpich3 or openmpi |
- | * mw256 are for jobs requiring large memory access (up to 24 jobs slots per node) | + | |
- | * for exclusive use of a node reserve all memory | + | |
- | | + | |
- | * be sure to reserve one or more job slot for each GPU used [[cluster:119|Submitting GPU Jobs]] | + | |
- | * be sure to use the correct wrapper script to set up mpirun from mvapich2 | + | |
* mw256fd | * mw256fd | ||
* or requiring lots of threads (job slots) confined to single node (Gaussian, Autodeck) | * or requiring lots of threads (job slots) confined to single node (Gaussian, Autodeck) | ||
Line 94: | Line 95: | ||
* mw128 (bought with faculty startup funds) tailored for Gaussian jobs | * mw128 (bought with faculty startup funds) tailored for Gaussian jobs | ||
* About 2TB / | * About 2TB / | ||
- | * Priority access for Carlos' | + | * Priority access for Carlos' |
- | * test (swallowtail, | + | * amber128 |
- | * wall time of 8 hours of CPU usage | + | * Be sure to use mpich3 for Amber |
+ | * Priority access for Amber jobs till 10/ | ||
+ | * exx96 contains 4 RTX2080S per node | ||
+ | * same setup as mwgpu queue | ||
+ | * test contains | ||
+ | * can be used for production runs | ||
+ | * beware | ||
+ | * mw128, NFSoRDMA, bought with faculty startup monies | ||
+ | * beware of preemptive events, checkpoint! | ||
+ | * 6 compute nodes | ||
+ | * Priority access for Sarah' | ||
- | **There are no wall time limits in our HPCC environment | + | **NOTE**: we are migrating from Openlava to Slurm during summer 2022. All queues |
- | * [[cluster:147|BLCR Checkpoint in OL3]] Serial Jobs | + | * [[cluster:213|New Head Node]] |
- | * [[cluster:148|BLCR Checkpoint | + | * [[cluster:218|Getting Started with Slurm Guide]] |
+ | |||
+ | **There are no wall time limits | ||
===== Other Stuff ===== | ===== Other Stuff ===== | ||
Line 107: | Line 120: | ||
Home directory policy and Rstore storage options [[cluster: | Home directory policy and Rstore storage options [[cluster: | ||
- | Checkpointing is supported in all queues, how it works [[cluster:124|BLCR]] page | + | Checkpointing is supported in all queues, how it works [[cluster:190|DMTCP]] page |
+ | |||
+ | For a list of software installed consult [[cluster: | ||
- | For a list of software installed consult [[cluster:73|Software List]] page | + | For a list of OpenHPC |
Details on all scratch spaces consult [[cluster: | Details on all scratch spaces consult [[cluster: | ||
Line 115: | Line 130: | ||
For HPCC acknowledgements consult [[cluster: | For HPCC acknowledgements consult [[cluster: | ||
- | Sample scripts for job submissions (serial, array, parallel, forked and gpu) can be found at ''/ | + | Sample scripts for job submissions (serial, array, parallel, forked and gpu) can be found at ''/ |
- | From off-campus you need to VPN in first at [[http://webvpn.wesleyan.edu]] | + | From off-campus you need to VPN in first, download GlobalProtect client |