This is an old revision of the document!
2020 upcoming changes and updates:
Tuesday's power outage removed BLCR's kernel modules. If you need to do checkpointing the new tool is Distributed MultiThreaded Checkpointing (DMTCP). Details on how to use DMTCP can be found here DMTCP
The HPCC has invested in a new solution for our Home Directories file server. The TrueNAS/ZFS solution selected is described here Home Dir Server. We will implement with very large user quotas. The storage is 190 TB usable with inline compression (475 TB effective if compression ratio achieved is 2.5x). Other features include; unlimited snapshots, read cache SSD, write cache SSD, self-healing (checksums on reads and writes and per schedule), RAIDZ2 protection, high availability (dual controllers). We will not implement deduplication. Maybe add replication in the future. This will take along time to deploy.
The HPCC has also invested in more GPU and CPU compute capacity. As the time of this writing, 12 nodes are crossing Iowa headed our way. A total for 48 gpus (model rtx2080s and 384 GB memory), 24 cpus (228 physical cores and 1,152 GB memory). Details of the selection process can be found here 2019 GPU Models
With the additional gpu nodes we are also launching and committing to the Nvidia GPU Cloud deploying Docker Containers albeit on premise. Since I did not know much about this an overview can be here and more details will be provided NGC Docker Containers