User Tools

Site Tools


cluster:200

Warning: Undefined array key 7 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cluster:200 [2020/12/22 13:59]
hmeij07 created
cluster:200 [2021/02/18 13:33] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +Update 
 + --- //[[hmeij@wesleyan.edu|Henk]] 2021/02/12 14:27//
 +
 +
 +----
 +
 +For CUDA_ARCH (or ''nvcc -arch'') versions check this [[http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/|Matching CUDA arch and CUDA gencode for various NVIDIA architectures]] web page. "When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation." //All Turing gpu models RTX2080, RTX5000 and RTX6000 use CUDA_ARCH sm_75// The former model is consumer grade, the latter two  models are enterprise grade. See performance differences below. The consumer grade RTX3060Ti is CUDA ARCH sm_86 (Ampere).
 +
 +----
 +
 +A detailed review and comparison of GEForce gpus, including the Quadro RTX 5000 and RTX 2080 (Ti and S) can be found at this[[https://www.servethehome.com/nvidia-quadro-rtx-5000-review-gpu/|NVIDIA Quadro RTX 5000 Review The Balanced Quadro GPU]] website. Deep Learning oriented performance results showing most of the applicable  precision modes are on **page 6** (INT8, FP16, FP32).
 +
 +  * Noteworthy re RTX2080S
 +    * VendorB "RTX 2080 Super are EOL"
 +    * VendorA "neigh impossible to obtain any of them"
 +  * Noteworthy re RTX3060Ti
 +    * VendorB "The 3060 Ti does not have the proper cooling for data center use and are not built for that environment"
 +    * VendorA "Lead times on the new GPUs are generally 2 months or more."
 +
 +^  VendorB1  ^^  Notes  ^  VendorA1  ^^^  VendorA2  ^^^
 +|  |||  |||  |||
 +^  Head Node   ^^  incl switches  ^  Head Node   ^^^  Head Node   ^^^
 +| Rack | 1U  |  |  1U |||  same |||
 +| Power | 1+1  |  208V  |  1+1 |||  same |||
 +| Nic | 2x1G+4x10G  |  +PCI  |  4x10G |||  same |||
 +| Rails | 25  |    |  25-33 |||  same |||
 +| CPU | 2x6226R  |  Gold  |  2x5222 |||  same |||
 +| cores | 2x16  |  Physical  |  2x4 |||  same |||
 +| ghz | 2.9  |    |  3.8 |||  same |||
 +| ddr4 | 192  |  gb  |  96 |||  same |||
 +| hdd | 2x480G  |   ssd (raid1)    2x960 |||  same |||
 +| centos | 8  |  yes  |  8 |||  same |||
 +| OpenHPC | yes  |  "best effort"  |  no |||  same |||
 +^  GPU Compute Node    ^^    ^  GPU Compute Node   ^^^  GPU Compute Node   ^^^
 +| Rack | 2U  |    |  4U |||  same |||
 +| Power | 1  |  208V  |  1+1 |||  same |||
 +| Nic | 2x1G+2x10G  |  +PCI  |  2x10G |||  same |||
 +| Rails | ?  |    |  26-36 |||  same |||
 +| CPU | 2x4214R  |  Silver  |  2x4214R |||  same |||
 +| cores | 2x12  |  Physical  |  2x12 |||  same |||
 +| ghz | 2.4  |    |  2.4 |||  same |||
 +| ddr4 | 192  |  gb  |  192 |||  same |||
 +| hdd | 480G  |  <ssd,sata>  |  2T |||  same |||
 +| centos | 8  |  with gpu drivers, toolkit  |  8 |||  same |||
 +| GPU | 4x(RTX 5000)  |  active cooling  |  4x(RTX 5000) |||  4x(RTX 6000) |||
 +| gddr6 | 16  |  gb  |  16 |||  24 |||
 +^   ^^^   ^^^  ^^^
 +| Switch | 1x(8+1)  |  <-- add self spare!  |  2x(16+2) |||  same |||
 +| S&H | tbd  |  |  tbd |||  tbd |||
 +| Δ | -5  |   target budget $k    -2.8 |||  +1.5 |||
 +
 +
 +  * RTX 5000 gpu teraflop compute capacity depends on compute mode
 +    * 0.35 TFLOPS (FP64), 11.2 TFLOPS (FP32), 22.3 TFLOPS (FP16), 178.4 TFLOPS (INT8)
 +  * RTX 6000 gpu teraflop compute capacity depends on compute mode
 +    * 0.51 TFLOPS (FP64), 16.3 TFLOPS (FP32), 32.6 TFLOPS (FP16), 261.2 TFLOPS (INT8)
 +
 +From NVIDIA's GeForce forums web site
 +
 +<code>
 +
 +Quadro RTX 5000 vs RTX 2080 
 +
 +both have effective 14000Mhz GDDR6
 +both have 64 ROPS.
 +
 +5000 has 16GB vs 2080's 8GB
 +5000 has 192 TMU's vs the 2080's 184
 +5000 has 3072 shaders vs the 2080's 2944
 +
 +the 5000 has a base clock of 1350 and average boost to 1730
 +the 2080 has a base clock of 1515 and average boost to 1710
 +the 5000 has 384 tensor cores vs the 2080's 368.
 +the 5000 has 48 RT cores vs the 2080's 46.
 +
 +5000
 +Pixel Rate    110.7 GPixel/
 +Texture Rate    332.2 GTexel/
 +FP16 (half) performance    166.1 GFLOPS (1:64) 
 +FP32 (float) performance    10,629 GFLOPS 
 +FP64 (double) performance    332.2 GFLOPS (1:32)
 +
 +2080
 +Pixel Rate    109.4 GPixel/
 +Texture Rate    314.6 GTexel/
 +FP16 (half) performance    157.3 GFLOPS (1:64) 
 +FP32 (float) performance    10,068 GFLOPS 
 +FP64 (double) performance    314.6 GFLOPS (1:32) 
 +
 +</code>
 +
 +
 +
  
 ==== Cottontail2 ==== ==== Cottontail2 ====
Line 7: Line 101:
  
 Switching to RJ45 10GBase-T network in this migration. And adopting CentOS 8 (possibly the Stream version as events unfold ... [[https://www.hpcwire.com/off-the-wire/centos-project-shifts-focus-to-centos-stream/|CentOS Stream ]] or [[http://rockylinux.org|Rocky Linux]]).   Switching to RJ45 10GBase-T network in this migration. And adopting CentOS 8 (possibly the Stream version as events unfold ... [[https://www.hpcwire.com/off-the-wire/centos-project-shifts-focus-to-centos-stream/|CentOS Stream ]] or [[http://rockylinux.org|Rocky Linux]]).  
 +
 +
 +**Whoooo! Check this out** https://almalinux.org/
 +  * rhel 1:1 feature compatible, thus centos
 +  * simply switch repos
 +  * out Q1/2021
  
 Also sticking to a single private network for scheduler and home directory traffic, at 10G, for each node in the new environment. The second 10G interface (onboot=no) could be brought up for future use in some scenario. Maybe a second switch for network redundancy. Keep private network 192.168.x.x for openlava/warewulf6 traffic, and private network 10.10.x.x for slurm/warewulf8 traffic, avoids conflicts. Also sticking to a single private network for scheduler and home directory traffic, at 10G, for each node in the new environment. The second 10G interface (onboot=no) could be brought up for future use in some scenario. Maybe a second switch for network redundancy. Keep private network 192.168.x.x for openlava/warewulf6 traffic, and private network 10.10.x.x for slurm/warewulf8 traffic, avoids conflicts.
cluster/200.1608663589.txt.gz · Last modified: 2020/12/22 13:59 by hmeij07