Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:126 [DokuWiki]

User Tools

Site Tools


cluster:126

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
cluster:126 [2017/12/06 10:40]
hmeij07
cluster:126 [2017/12/06 10:57]
hmeij07
Line 40: Line 40:
   * 18 nodes with dual twelve core chips (Xeon E5-2650 v4, 2.2 Ghz) in Supermicro 1U rack servers with a memory footprint of 128 GB each (2,304 GB). This cluster has a compute capacity of 14.3 teraflops (estimated). Known as the Microway "Carlos" CPU cluster, or nodes n60-n77, queue mw128, 648 job slots.   * 18 nodes with dual twelve core chips (Xeon E5-2650 v4, 2.2 Ghz) in Supermicro 1U rack servers with a memory footprint of 128 GB each (2,304 GB). This cluster has a compute capacity of 14.3 teraflops (estimated). Known as the Microway "Carlos" CPU cluster, or nodes n60-n77, queue mw128, 648 job slots.
  
-  * 1 node with dual eight core chips (Xeon E5-2620 v4, 2.10 Ghz) in Supermicro 1U rack server with a memory footprint of 64 GB. This node has four GTX1080Ti gpus providing +  * 1 node with dual eight core chips (Xeon E5-2620 v4, 2.10 Ghz) in Supermicro 1U rack server with a memory footprint of 64 GB (128 GB). This node has four GTX1080Ti (44 GB memory footprint) gpus providing 1.42 teraflops. Known as the "donated Amber" node n78, queue amber128, 24 job slots.
  
-All queues are available for job submissions via all login nodes. All nodes on Infiniband switches for parallel computational jobs (excludes bss24, tinymem and mw128 queues).  Our total job slot count is roughly 1,688, our physical core count 1,176. Our total teraflops compute capacity is about 36 cpu side, 23 gpu side. Our total memory footprint is about 100 GB gpu side,  7,280 GB cpu side (excludes queue bss24).+All queues are available for job submissions via all login nodes. All nodes on Infiniband switches for parallel computational jobs (excludes tinymemmw128 and amber128 queues).  Our total job slot count is roughly 1,712 with our physical core count 1,192. Our total teraflops compute capacity is about 38 cpu side, 25 gpu side (double precision floating point). Our total memory footprint is about 144 GB gpu side,  7,408 GB cpu side.
  
-Home directory file system are provided (via NFS or IPoIB) by the node ''sharptail'' (our file server) from a direct attached disk array. In total, 10 TB of /home disk space is accessible to the users. Node ''greentail'' makes available 33 TB of scratch space at /sanscratch via NFS.  In addition all nodes provide local scratch space at /localscratch (excludes queue tinymem). The Openlava scheduler automatically makes directories in both these scratch areas for each job (named after JOBPID). Backup services for /home are provided via disk-to-disk snapshots from node ''sharptail'' to node ''cottontail'' disk arrays. (daily, weekly, monthly snapshots are mounted read only on ''cottontail'' for self-serve content retrievals)+Home directory file system are provided (via NFS or IPoIB) by the node ''sharptail'' (our file server) from a direct attached disk array. In total, 10 TB of /home disk space is accessible to the users. Node ''greentail'' makes available 33 TB of scratch space at /sanscratch via NFS.  In addition all nodes provide local scratch space at /localscratch (excludes queue tinymem). The Openlava scheduler automatically makes directories in both these scratch areas for each job (named after JOBPID). Backup services for /home are provided via disk-to-disk point-in-time snapshots from node ''sharptail'' to node ''cottontail'' disk arrays. (daily, weekly, monthly snapshots are mounted read only on ''cottontail'' for self-serve content retrievals). Some chemists have their home directories on node ''ringtail'' which provides 33 TB via /home33.
  
-A subset of 25 nodes of the Blue Sky Studio cluster listed above also runs our test Hadoop cluster.  The namenode and login node is ''whitetail'' and also contains the scheduler for Hadoop. It is based on Cloudera CD3U6 repository.  +<del>A subset of 25 nodes of the Blue Sky Studio cluster listed above also runs our test Hadoop cluster.  The namenode and login node is ''whitetail'' and also contains the scheduler for Hadoop. It is based on Cloudera CD3U6 repository.</del>  
  
-Two Rstore storage servers each provide about 104 TB of usable backup space which is not mounted on the compute nodes. Each Rstore server's content is replicated to a dedicated passive standby server of same size.+Two Rstore storage servers each provide about 104 TB of usable backup space which is not mounted on the compute nodes. Each Rstore server's content is replicated to a dedicated passive standby server of same size, located in same data center but in different racks.
  
  
Line 56: Line 56:
  
 ^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^ ^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^
-|  matlab  |  //na//  |  //na//  |  //na//  |   QDR infiniband | //any host// |  8/16 licenses  | 
 |  stata  |   //na//  |  //na//  |  //na//  |   QDR infiniband | //any host// |  6 licenses | |  stata  |   //na//  |  //na//  |  //na//  |   QDR infiniband | //any host// |  6 licenses |
-|  mathematica  |  //na//  |  //na//  |  //na//  |   QDR infiniband |  //any host//  |  unlimited licenses  | 
  
 +Note: Matlab and Mathematica now have "unlimited licenses".
  
-^Queue^Nr Of Nodes^Total GB Mem Per Node^Total Cores In Queue^Switch^Hosts^Notes^+ 
 +^Queue^Nr Of Nodes^Total GB Mem Per Node^Job SLots In Queue^Switch^Hosts^Notes^
 |  hp12  |   32  |  12  |  256  | QDR infiniband  | n1-n32 |  CPU  | |  hp12  |   32  |  12  |  256  | QDR infiniband  | n1-n32 |  CPU  |
-|  bss24  |  42  |  24  |   84  | gigabit ethernet  | b1-b49 |  CPU  | 
 |  mwgpu  |    |  256  |  120  | QDR infiniband  | n33-n37 |  GPU & CPU  | |  mwgpu  |    |  256  |  120  | QDR infiniband  | n33-n37 |  GPU & CPU  |
-|  mw256fd  |    |  256  |  256  | QDR infiniband  | n38-n45 |  CPU  |+|  mw256fd  |    |  256  |  192  | QDR infiniband  | n38-n45 |  CPU  |
 |  tinymem  |   14  |  32  |  448  | gigabit ethernet  | n39-n59 |  CPU  | |  tinymem  |   14  |  32  |  448  | gigabit ethernet  | n39-n59 |  CPU  |
-|  mw128  |   18  |  129   648  | gigabit ethernet  | n60-n77 |  CPU  |+|  mw128  |   18  |  128   648  | gigabit ethernet  | n60-n77 |  CPU  | 
 +|  amber128  |    |  128  |  24  | gigabit ethernet  | n78 |  GPCU & CPU  |
  
 Some guidelines for appropriate queue usage with detailed page links: Some guidelines for appropriate queue usage with detailed page links:
Line 73: Line 73:
   * hp12 is the default queue   * hp12 is the default queue
     * for processing lots of small to medium memory footprint jobs     * for processing lots of small to medium memory footprint jobs
-  * bss24, primarily used by bioinformatics group, available to all if needed 
-    * when not in use powered off, email me (hmeij@wes) or PEND jobs (hpcadmin will get notified) 
-    * also our Hadoop cluster [[cluster:115|Use Hadoop Cluster]] 
   * mw256 are for jobs requiring large memory access (up to 24 jobs slots per node)   * mw256 are for jobs requiring large memory access (up to 24 jobs slots per node)
     * for exclusive use of a node reserve all memory     * for exclusive use of a node reserve all memory
Line 94: Line 91:
     * About 2TB /localscratch (Raid 10) on each node     * About 2TB /localscratch (Raid 10) on each node
     * Priority access for Carlos' group till summer 2020     * Priority access for Carlos' group till summer 2020
 +  * amber128 (donated hardware) tailored for Amber16 jobs
 +    * Be sure to use mpich3 for Amber
 +    * Priority access for Amber jobs
   * test (swallowtail, petaltail, cottontail2)   * test (swallowtail, petaltail, cottontail2)
     * wall time of 8 hours of CPU usage     * wall time of 8 hours of CPU usage
cluster/126.txt · Last modified: 2023/10/23 15:37 by hmeij07