User Tools

Site Tools


cluster:191

This is an old revision of the document!



Back

NewsBytes for Jan 2020

2019 Queue Usage
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:188

2019 dedicated monitoring and alerting server Zenoss
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:183

2020 upcoming changes and updates

Tuesday's (1/21) power outage removed BLCR's kernel modules from the compute nodes kernels. If you need to do checkpointing the new tool is Distributed MultiThreaded Checkpointing (DMTCP). Details on how to use DMTCP can be found here DMTCP, if you need help let me know (the “tails” also have DMTCP installed for debugging)
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:190

The HPCC has invested in a new solution for our Home Directories file server. The TrueNAS/ZFS solution selected is described here Home Dir Server. We will implement with very large user quotas. The storage is 190 TB usable with inline compression (475 TB effective usable if compression ratio achieved is 2.5x). Other features include; unlimited snapshots point in time restores), read cache SSD, write cache SSD, self-healing (checksums on reads and writes and per schedule), RAIDZ2 protection, high availability (dual controllers). We will not implement de-duplication. Maybe add replication in the future. This will take along time to deploy.
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:186

The HPCC has also invested in more GPU and CPU compute capacity. At the time of this writing, 12 nodes are crossing Iowa from CA headed our way. A total for 48 gpus (model rtx2080s and 384 GB memory), 24 cpus (228 physical cores and 1,152 GB memory). Details of the selection process can be found here 2019 GPU Models
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:184

With the additional gpu nodes we are also launching and committing to the Nvidia GPU Cloud. We will deploy their cloud Docker Containers albeit on premise. Since I did not know much about this an overview can be found here and more details will be provided later on NGC Docker Containers
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:187
Nvidia GPU Cloud (browse the online Catalog)
https://www.nvidia.com/en-us/gpu-cloud/containers/

Lots of work!


Back

cluster/191.1579894144.txt.gz · Last modified: 2020/01/24 14:29 by hmeij07