User Tools

Site Tools


cluster:191

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:191 [2020/01/24 14:10]
hmeij07 [NewsBytes Jan 2020]
cluster:191 [2020/01/24 16:36] (current)
hmeij07
Line 4: Line 4:
 ==== NewsBytes for Jan 2020 ==== ==== NewsBytes for Jan 2020 ====
  
-2019 [[cluster:​188|2019 Queue Usage]]\\+[[cluster:​188|2019 Queue Usage]]\\
 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​188 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​188
  
Line 12: Line 12:
 **2020 upcoming changes and updates** **2020 upcoming changes and updates**
  
-Tuesday'​s (1/21) power outage removed BLCR's kernel modules from the compute nodes kernels. If you need to do checkpointing the new tool is Distributed MultiThreaded Checkpointing (DMTCP). ​ Details on how to use DMTCP can be found here [[cluster:​190|DMTCP]],​ if you need help let me know\\+Tuesday'​s (1/21) power outage removed BLCR's kernel modules from the compute nodes kernels. If you need to do checkpointing the new tool is Distributed MultiThreaded Checkpointing (DMTCP). ​ Details on how to use DMTCP can be found here [[cluster:​190|DMTCP]],​ if you need help let me know (the "​tails"​ also have DMTCP installed for debugging)\\
 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​190 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​190
  
-The HPCC has invested in a new solution for our Home Directories file server. The TrueNAS/ZFS solution selected is described here [[cluster:​186|Home Dir Server]]. We will implement with very large user quotas. The storage is 190 TB usable with inline compression (475 TB effective if compression ratio achieved is 2.5x). ​ Other features include; unlimited snapshots, read cache SSD, write cache SSD, self-healing (checksums on reads and writes and per schedule), RAIDZ2 protection, high availability (dual controllers). We will not implement deduplication. Maybe add replication in the future. This will take along time to deploy. \\ +The HPCC has invested in a new solution for our Home Directories file server. The TrueNAS/ZFS solution selected is described here [[cluster:​186|Home Dir Server]]. We will implement with very large user quotas. The storage is 190 TB usable with inline compression (475 TB effective ​usable ​if compression ratio achieved is 2.5x, scalable to 1.2 P raw).  Other features include; unlimited snapshots ​(point in time restores), read cache SSD, write cache SSD, self-healing (checksums on reads and writes and per schedule), RAIDZ2 protection, high availability (dual controllers). We will not be implementing de-duplication. Maybe add replication in the future. This will take along time to deploy. \\ 
-https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​190+https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​186
  
-The HPCC has also invested in more GPU and CPU compute capacity. ​As the time of this writing, 12 nodes are crossing Iowa headed our way. A total for 48 gpus (model rtx2080s ​and 384 GB memory), 24 cpus (228 physical cores and 1,152 GB memory). Details of the selection process can be found here [[cluster:181|2019 GPU Models]]\\+The HPCC has also invested in more GPU and CPU compute capacity. ​At the time of this writing, 12 nodes are crossing Iowa from CA headed our way. A total for 48 gpus (model rtx2080s ​with 384 GB memory), 24 cpus (228 physical cores with 1,152 GB memory). Details of the selection process can be found here [[cluster:184|Turing/​Volta/​Pascal]]\\
 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​184 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​184
  
-With the additional gpu nodes we are also launching and committing to the Nvidia GPU Cloud deploying ​Docker Containers albeit on premise. Since I did not know much about this an overview can be here and more details will be provided [[cluster:​187|NGC Docker Containers]]\\+With the additional gpu nodes we are also launching and committing to the Nvidia GPU Cloud. We will deploy their cloud Docker Containers albeit on premise. Since I did not know much about this an overview can be found here and more details will be provided ​later on [[cluster:​187|NGC Docker Containers]]\\
 https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​187 \\ https://​dokuwiki.wesleyan.edu/​doku.php?​id=cluster:​187 \\
-https://ngc.nvidia.com/​catalog/all?​orderBy=modifiedDESC&​query=label%3A%22High%20Performance%20Computing%22%20&​quickFilter=all&​filters=+Nvidia GPU Cloud (browse the online Catalog)\\ 
 +https://www.nvidia.com/​en-us/gpu-cloud/​containers/​ 
  
-Lots of work!+Lots of work! Lots to learn!
  
 \\ \\
 **[[cluster:​0|Back]]** **[[cluster:​0|Back]]**
cluster/191.1579893015.txt.gz · Last modified: 2020/01/24 14:10 by hmeij07