Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:126 [DokuWiki]

User Tools

Site Tools


cluster:126

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
cluster:126 [2020/02/27 09:26]
hmeij07
cluster:126 [2020/02/27 09:40]
hmeij07
Line 10: Line 10:
 The High Performance Compute Cluster (HPCC) is comprised of several login nodes (all are on our domain //wesleyan.edu// behind VPN for off campus access) The High Performance Compute Cluster (HPCC) is comprised of several login nodes (all are on our domain //wesleyan.edu// behind VPN for off campus access)
  
-  * primary login node ''cottontail'' (Supermicro 4U), Openlava scheduler and snapshot engine for /home+  * primary login node ''cottontail'' (Supermicro 4U), primary scheduler and snapshot engine for /home
   * secondary login node ''cottontail2'' (HP Proliant G380 2U), backup scheduler   * secondary login node ''cottontail2'' (HP Proliant G380 2U), backup scheduler
   * secondary login node ''swallowtail'' (Dell PowerEdge 2950 2U), backup scheduler, databases   * secondary login node ''swallowtail'' (Dell PowerEdge 2950 2U), backup scheduler, databases
   * sandbox ''petaltail'' (Dell PowerEdge 2950 2U), test box, Warewulf provisioning CentOS6   * sandbox ''petaltail'' (Dell PowerEdge 2950 2U), test box, Warewulf provisioning CentOS6
 +  * sandbox ''whitetail'' (HP Proliant G380 2U), Warewulf OpenHPC provisioning CentOS7
 +  * zenoss monitoring and alerting server ''hpcmon'' (supermicro 1U, centos6)
   * NFS server ''greentail52'' (SuperMicro 36+2, 2U), /sanscratch    * NFS server ''greentail52'' (SuperMicro 36+2, 2U), /sanscratch 
   * (only log in when moving conternt) file server node ''sharptail'' (Supermicro 4U), /home NFS server   * (only log in when moving conternt) file server node ''sharptail'' (Supermicro 4U), /home NFS server
Line 21: Line 23:
   * mindstore storage servers ''mstore0/mstore1'' (Supermicro 4U), available on HPC (2x 110T)   * mindstore storage servers ''mstore0/mstore1'' (Supermicro 4U), available on HPC (2x 110T)
  
-Several types of compute nodes are available via the OpenLava scheduler, http://www.openlava.org+Several types of compute nodes are available via the scheduler: 
  
   * All are running CentOS6.10 or CentOS7.7   * All are running CentOS6.10 or CentOS7.7
Line 55: Line 57:
 ===== Our Queues ===== ===== Our Queues =====
  
-Commercial software has their own queue limited by available licenses. There are no scheduler license resources, just queue jobs up in appropriate queue. Jobs are processed on the  nodes of hp12, mwgpu <del>mw256</del>, and mw256fd queues. That can change if we need to.+Commercial software has their own queue limited by available licenses. There are no scheduler license resources, just queue jobs up in appropriate queue. Commercial software jobs are processed on the  nodes of mw256fd and mw128
  
 ^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^ ^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^
-|  stata  |   //na//  |  //na//  |  //na//  |   QDR infiniband | //any host// |  6 licenses |+|  stata  |   //na//  |  //na//  |  //na//  |   QDR Infiniband | //any host// |  6 licenses |
  
 Note: Matlab and Mathematica now have "unlimited licenses". Note: Matlab and Mathematica now have "unlimited licenses".
Line 69: Line 71:
 |  tinymem  |   14  |  32  |  448  | gigabit ethernet  | n39-n59 |  CPU  | |  tinymem  |   14  |  32  |  448  | gigabit ethernet  | n39-n59 |  CPU  |
 |  mw128  |   18  |  128  |  648  | gigabit ethernet  | n60-n77 |  CPU  | |  mw128  |   18  |  128  |  648  | gigabit ethernet  | n60-n77 |  CPU  |
-|  amber128  |    |  128  |  24  | gigabit ethernet  | n78 |  GPCU & CPU  |+|  amber128  |    |  128  |  24  | gigabit ethernet  | n78 |  GPU & CPU  | 
 +|  exx96  |   12  |  96  |  432  | gigabit ethernet  | n79-n90 |  GPU & CPU  |
  
 Some guidelines for appropriate queue usage with detailed page links: Some guidelines for appropriate queue usage with detailed page links:
Line 75: Line 78:
   * hp12 is the default queue   * hp12 is the default queue
     * for processing lots of small to medium memory footprint jobs     * for processing lots of small to medium memory footprint jobs
-  * mw256 are for jobs requiring large memory access (up to 24 jobs slots per node) 
-    * for exclusive use of a node reserve all memory 
   * mwgpu is for GPU enabled software primarily (Amber, Lammps, NAMD, Gromacs, Matlab, Mathematica)   * mwgpu is for GPU enabled software primarily (Amber, Lammps, NAMD, Gromacs, Matlab, Mathematica)
-    * be sure to reserve one or more job slot for each GPU used [[cluster:119|Submitting GPU Jobs]] +    * be sure to reserve one or more job slot for each GPU used [[cluster:192|EXX96]] 
-    * be sure to use the correct wrapper script to set up mpirun from mvapich2+    * be sure to use the correct wrapper script to set up mpirun from mvapich2, mpich3 or openmpi
   * mw256fd  are for jobs requiring large memory access (up to 24 jobs slots per node)    * mw256fd  are for jobs requiring large memory access (up to 24 jobs slots per node) 
     * or requiring lots of threads (job slots) confined to single node (Gaussian, Autodeck)     * or requiring lots of threads (job slots) confined to single node (Gaussian, Autodeck)
Line 92: Line 93:
   * mw128 (bought with faculty startup funds) tailored for Gaussian jobs   * mw128 (bought with faculty startup funds) tailored for Gaussian jobs
     * About 2TB /localscratch (Raid 10) on each node     * About 2TB /localscratch (Raid 10) on each node
-    * Priority access for Carlos' group till summer 2020+    * Priority access for Carlos' group till 07/01/2020
   * amber128 (donated hardware) tailored for Amber16 jobs   * amber128 (donated hardware) tailored for Amber16 jobs
     * Be sure to use mpich3 for Amber     * Be sure to use mpich3 for Amber
-    * Priority access for Amber jobs+    * Priority access for Amber jobs till 10/01/2020
   * test (swallowtail, petaltail, cottontail2)   * test (swallowtail, petaltail, cottontail2)
-    * wall time of 8 hours of CPU usage+    * wall time of 8 hours of CPU usage 
  
-**There are no wall time limits in our HPCC environment except for queue ''test''.** You are responsible for checkpointing though. Consult these pages, all nodes in all queues are BLCR enabled. Logins nodes and storage nodes are on UPS but all compute nodes are on utility power. Crashes do happen, be prepared to restart your long running jobs. +**There are no wall time limits in our HPCC environment except for queue ''test''.** You are responsible for checkpointing though. Consult these pages, all nodes in all queues are DMTCP enabled (read [[cluster:190|DMTCP]]. Logins nodes and storage nodes are on UPS but all compute nodes are on utility power. Crashes do happen, be prepared to restart your long running jobs.
- +
-  * [[cluster:147|BLCR Checkpoint in OL3]] Serial Jobs +
-  * [[cluster:148|BLCR Checkpoint in OL3]] Parallel Jobs+
  
 ===== Other Stuff ===== ===== Other Stuff =====
Line 110: Line 108:
 Checkpointing is supported in all queues, how it works [[cluster:190|DMTCP]] page Checkpointing is supported in all queues, how it works [[cluster:190|DMTCP]] page
  
-For a list of software installed consult [[cluster:73|Software List]] page+For a list of software installed consult [[cluster:73|Software List]] page, endless...
  
 Details on all scratch spaces consult [[cluster:142|Scratch Spaces]] page Details on all scratch spaces consult [[cluster:142|Scratch Spaces]] page
Line 118: Line 116:
 Sample scripts for job submissions (serial, array, parallel, forked and gpu) can be found at ''/home/hmeij/jobs/'' Sample scripts for job submissions (serial, array, parallel, forked and gpu) can be found at ''/home/hmeij/jobs/''
  
-From off-campus you need to VPN in first at [[http://webvpn.wesleyan.edu]]+From off-campus you need to VPN in first at [[http://vpn.wesleyan.edu]]
  
  
cluster/126.txt · Last modified: 2023/10/23 15:37 by hmeij07