User Tools

Site Tools


cluster:191

This is an old revision of the document!



Back

NewsBytes Jan 2020

2019 2019 Queue Usage
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:188

2019 dedicated monitoring and alerting server Zenoss
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:183

2020 upcoming changes and updates

Tuesday's (1/21) power outage removed BLCR's kernel modules from the compute nodes kernels. If you need to do checkpointing the new tool is Distributed MultiThreaded Checkpointing (DMTCP). Details on how to use DMTCP can be found here DMTCP, if you need help let me know
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:190

The HPCC has invested in a new solution for our Home Directories file server. The TrueNAS/ZFS solution selected is described here Home Dir Server. We will implement with very large user quotas. The storage is 190 TB usable with inline compression (475 TB effective if compression ratio achieved is 2.5x). Other features include; unlimited snapshots, read cache SSD, write cache SSD, self-healing (checksums on reads and writes and per schedule), RAIDZ2 protection, high availability (dual controllers). We will not implement deduplication. Maybe add replication in the future. This will take along time to deploy.
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:190

The HPCC has also invested in more GPU and CPU compute capacity. As the time of this writing, 12 nodes are crossing Iowa headed our way. A total for 48 gpus (model rtx2080s and 384 GB memory), 24 cpus (228 physical cores and 1,152 GB memory). Details of the selection process can be found here 2019 GPU Models
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:184

With the additional gpu nodes we are also launching and committing to the Nvidia GPU Cloud deploying Docker Containers albeit on premise. Since I did not know much about this an overview can be here and more details will be provided NGC Docker Containers
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:187
https://ngc.nvidia.com/catalog/all?orderBy=modifiedDESC&query=label%3A%22High%20Performance%20Computing%22%20&quickFilter=all&filters=

Lots of work!


Back

cluster/191.1579892999.txt.gz · Last modified: 2020/01/24 14:09 by hmeij07