User Tools

Site Tools


cluster:200

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:200 [2021/01/14 15:20]
hmeij07
cluster:200 [2021/02/18 13:33] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +Update 
 + --- //[[hmeij@wesleyan.edu|Henk]] 2021/02/12 14:27//
 +
 +
 +----
 +
 +For CUDA_ARCH (or ''nvcc -arch'') versions check this [[http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/|Matching CUDA arch and CUDA gencode for various NVIDIA architectures]] web page. "When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation." //All Turing gpu models RTX2080, RTX5000 and RTX6000 use CUDA_ARCH sm_75// The former model is consumer grade, the latter two  models are enterprise grade. See performance differences below. The consumer grade RTX3060Ti is CUDA ARCH sm_86 (Ampere).
 +
 +----
 +
 +A detailed review and comparison of GEForce gpus, including the Quadro RTX 5000 and RTX 2080 (Ti and S) can be found at this[[https://www.servethehome.com/nvidia-quadro-rtx-5000-review-gpu/|NVIDIA Quadro RTX 5000 Review The Balanced Quadro GPU]] website. Deep Learning oriented performance results showing most of the applicable  precision modes are on **page 6** (INT8, FP16, FP32).
 +
 +  * Noteworthy re RTX2080S
 +    * VendorB "RTX 2080 Super are EOL"
 +    * VendorA "neigh impossible to obtain any of them"
 +  * Noteworthy re RTX3060Ti
 +    * VendorB "The 3060 Ti does not have the proper cooling for data center use and are not built for that environment"
 +    * VendorA "Lead times on the new GPUs are generally 2 months or more."
 +
 +^  VendorB1  ^^  Notes  ^  VendorA1  ^^^  VendorA2  ^^^
 +|  |||  |||  |||
 +^  Head Node   ^^  incl switches  ^  Head Node   ^^^  Head Node   ^^^
 +| Rack | 1U  |  |  1U |||  same |||
 +| Power | 1+1  |  208V  |  1+1 |||  same |||
 +| Nic | 2x1G+4x10G  |  +PCI  |  4x10G |||  same |||
 +| Rails | 25  |    |  25-33 |||  same |||
 +| CPU | 2x6226R  |  Gold  |  2x5222 |||  same |||
 +| cores | 2x16  |  Physical  |  2x4 |||  same |||
 +| ghz | 2.9  |    |  3.8 |||  same |||
 +| ddr4 | 192  |  gb  |  96 |||  same |||
 +| hdd | 2x480G  |   ssd (raid1)    2x960 |||  same |||
 +| centos | 8  |  yes  |  8 |||  same |||
 +| OpenHPC | yes  |  "best effort"  |  no |||  same |||
 +^  GPU Compute Node    ^^    ^  GPU Compute Node   ^^^  GPU Compute Node   ^^^
 +| Rack | 2U  |    |  4U |||  same |||
 +| Power | 1  |  208V  |  1+1 |||  same |||
 +| Nic | 2x1G+2x10G  |  +PCI  |  2x10G |||  same |||
 +| Rails | ?  |    |  26-36 |||  same |||
 +| CPU | 2x4214R  |  Silver  |  2x4214R |||  same |||
 +| cores | 2x12  |  Physical  |  2x12 |||  same |||
 +| ghz | 2.4  |    |  2.4 |||  same |||
 +| ddr4 | 192  |  gb  |  192 |||  same |||
 +| hdd | 480G  |  <ssd,sata>  |  2T |||  same |||
 +| centos | 8  |  with gpu drivers, toolkit  |  8 |||  same |||
 +| GPU | 4x(RTX 5000)  |  active cooling  |  4x(RTX 5000) |||  4x(RTX 6000) |||
 +| gddr6 | 16  |  gb  |  16 |||  24 |||
 +^   ^^^   ^^^  ^^^
 +| Switch | 1x(8+1)  |  <-- add self spare!  |  2x(16+2) |||  same |||
 +| S&H | tbd  |  |  tbd |||  tbd |||
 +| Δ | -5  |   target budget $k    -2.8 |||  +1.5 |||
 +
 +
 +  * RTX 5000 gpu teraflop compute capacity depends on compute mode
 +    * 0.35 TFLOPS (FP64), 11.2 TFLOPS (FP32), 22.3 TFLOPS (FP16), 178.4 TFLOPS (INT8)
 +  * RTX 6000 gpu teraflop compute capacity depends on compute mode
 +    * 0.51 TFLOPS (FP64), 16.3 TFLOPS (FP32), 32.6 TFLOPS (FP16), 261.2 TFLOPS (INT8)
 +
 +From NVIDIA's GeForce forums web site
 +
 +<code>
 +
 +Quadro RTX 5000 vs RTX 2080 
 +
 +both have effective 14000Mhz GDDR6
 +both have 64 ROPS.
 +
 +5000 has 16GB vs 2080's 8GB
 +5000 has 192 TMU's vs the 2080's 184
 +5000 has 3072 shaders vs the 2080's 2944
 +
 +the 5000 has a base clock of 1350 and average boost to 1730
 +the 2080 has a base clock of 1515 and average boost to 1710
 +the 5000 has 384 tensor cores vs the 2080's 368.
 +the 5000 has 48 RT cores vs the 2080's 46.
 +
 +5000
 +Pixel Rate    110.7 GPixel/
 +Texture Rate    332.2 GTexel/
 +FP16 (half) performance    166.1 GFLOPS (1:64) 
 +FP32 (float) performance    10,629 GFLOPS 
 +FP64 (double) performance    332.2 GFLOPS (1:32)
 +
 +2080
 +Pixel Rate    109.4 GPixel/
 +Texture Rate    314.6 GTexel/
 +FP16 (half) performance    157.3 GFLOPS (1:64) 
 +FP32 (float) performance    10,068 GFLOPS 
 +FP64 (double) performance    314.6 GFLOPS (1:32) 
 +
 +</code>
 +
 +
 +
  
 ==== Cottontail2 ==== ==== Cottontail2 ====
cluster/200.1610655617.txt.gz · Last modified: 2021/01/14 15:20 by hmeij07