Warning: Undefined array key 7 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

--- cluster:200 [2020/12/22 13:59]
hmeij07 created
+++ cluster:200 [2021/02/18 13:33] (current)
hmeij07
@@ Line 1: / Line 1: @@
 \\
 **[[cluster:0|Back]]**
+Update
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/02/12 14:27//
+----
+For CUDA_ARCH (or ''nvcc -arch'') versions check this [[http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/|Matching CUDA arch and CUDA gencode for various NVIDIA architectures]] web page. "When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation." //All Turing gpu models RTX2080, RTX5000 and RTX6000 use CUDA_ARCH sm_75// The former model is consumer grade, the latter two  models are enterprise grade. See performance differences below. The consumer grade RTX3060Ti is CUDA ARCH sm_86 (Ampere).
+----
+A detailed review and comparison of GEForce gpus, including the Quadro RTX 5000 and RTX 2080 (Ti and S) can be found at this[[https://www.servethehome.com/nvidia-quadro-rtx-5000-review-gpu/|NVIDIA Quadro RTX 5000 Review The Balanced Quadro GPU]] website. Deep Learning oriented performance results showing most of the applicable  precision modes are on **page 6** (INT8, FP16, FP32).
+  * Noteworthy re RTX2080S
+    * VendorB "RTX 2080 Super are EOL"
+    * VendorA "neigh impossible to obtain any of them"
+  * Noteworthy re RTX3060Ti
+    * VendorB "The 3060 Ti does not have the proper cooling for data center use and are not built for that environment"
+    * VendorA "Lead times on the new GPUs are generally 2 months or more."
+^  VendorB1  ^^  Notes  ^  VendorA1  ^^^  VendorA2  ^^^
+|  |||  |||  |||
+^  Head Node   ^^  incl switches  ^  Head Node   ^^^  Head Node   ^^^
+| Rack | 1U  |  |  1U |||  same |||
+| Power | 1+1  |  208V  |  1+1 |||  same |||
+| Nic | 2x1G+4x10G  |  +PCI  |  4x10G |||  same |||
+| Rails | 25  |    |  25-33 |||  same |||
+| CPU | 2x6226R  |  Gold  |  2x5222 |||  same |||
+| cores | 2x16  |  Physical  |  2x4 |||  same |||
+| ghz | 2.9  |    |  3.8 |||  same |||
+| ddr4 | 192  |  gb  |  96 |||  same |||
+| hdd | 2x480G  |   ssd (raid1)   |  2x960 |||  same |||
+| centos | 8  |  yes  |  8 |||  same |||
+| OpenHPC | yes  |  "best effort"  |  no |||  same |||
+^  GPU Compute Node    ^^    ^  GPU Compute Node   ^^^  GPU Compute Node   ^^^
+| Rack | 2U  |    |  4U |||  same |||
+| Power | 1  |  208V  |  1+1 |||  same |||
+| Nic | 2x1G+2x10G  |  +PCI  |  2x10G |||  same |||
+| Rails | ?  |    |  26-36 |||  same |||
+| CPU | 2x4214R  |  Silver  |  2x4214R |||  same |||
+| cores | 2x12  |  Physical  |  2x12 |||  same |||
+| ghz | 2.4  |    |  2.4 |||  same |||
+| ddr4 | 192  |  gb  |  192 |||  same |||
+| hdd | 480G  |  <ssd,sata>  |  2T |||  same |||
+| centos | 8  |  with gpu drivers, toolkit  |  8 |||  same |||
+| GPU | 4x(RTX 5000)  |  active cooling  |  4x(RTX 5000) |||  4x(RTX 6000) |||
+| gddr6 | 16  |  gb  |  16 |||  24 |||
+^   ^^^   ^^^  ^^^
+| Switch | 1x(8+1)  |  <-- add self spare!  |  2x(16+2) |||  same |||
+| S&H | tbd  |  |  tbd |||  tbd |||
+| Δ | -5  |   target budget $k   |  -2.8 |||  +1.5 |||
+  * RTX 5000 gpu teraflop compute capacity depends on compute mode
+    * 0.35 TFLOPS (FP64), 11.2 TFLOPS (FP32), 22.3 TFLOPS (FP16), 178.4 TFLOPS (INT8)
+  * RTX 6000 gpu teraflop compute capacity depends on compute mode
+    * 0.51 TFLOPS (FP64), 16.3 TFLOPS (FP32), 32.6 TFLOPS (FP16), 261.2 TFLOPS (INT8)
+From NVIDIA's GeForce forums web site
+<code>
+Quadro RTX 5000 vs RTX 2080
+both have effective 14000Mhz GDDR6
+both have 64 ROPS.
+has 16GB vs 2080's 8GB
+has 192 TMU's vs the 2080's 184
+has 3072 shaders vs the 2080's 2944
+the 5000 has a base clock of 1350 and average boost to 1730
+the 2080 has a base clock of 1515 and average boost to 1710
+the 5000 has 384 tensor cores vs the 2080's 368.
+the 5000 has 48 RT cores vs the 2080's 46.
+Pixel Rate    110.7 GPixel/s
+Texture Rate    332.2 GTexel/s
+FP16 (half) performance    166.1 GFLOPS (1:64)
+FP32 (float) performance    10,629 GFLOPS
+FP64 (double) performance    332.2 GFLOPS (1:32)
+Pixel Rate    109.4 GPixel/s
+Texture Rate    314.6 GTexel/s
+FP16 (half) performance    157.3 GFLOPS (1:64)
+FP32 (float) performance    10,068 GFLOPS
+FP64 (double) performance    314.6 GFLOPS (1:32)
+</code>
 ==== Cottontail2 ====
@@ Line 7: / Line 101: @@
 Switching to RJ45 10GBase-T network in this migration. And adopting CentOS 8 (possibly the Stream version as events unfold ... [[https://www.hpcwire.com/off-the-wire/centos-project-shifts-focus-to-centos-stream/|CentOS Stream ]] or [[http://rockylinux.org|Rocky Linux]]).
+**Whoooo! Check this out** https://almalinux.org/
+  * rhel 1:1 feature compatible, thus centos
+  * simply switch repos
+  * out Q1/2021
 Also sticking to a single private network for scheduler and home directory traffic, at 10G, for each node in the new environment. The second 10G interface (onboot=no) could be brought up for future use in some scenario. Maybe a second switch for network redundancy. Keep private network 192.168.x.x for openlava/warewulf6 traffic, and private network 10.10.x.x for slurm/warewulf8 traffic, avoids conflicts.

DokuWiki

User Tools

Site Tools

Differences

Page Tools