User Tools

Site Tools


cluster:181

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:181 [2019/08/08 09:25]
hmeij07 [2019 GPU Models]
cluster:181 [2019/08/13 08:15] (current)
hmeij07 [2019 GPU Models]
Line 4: Line 4:
 ===== 2019 GPU Models ===== ===== 2019 GPU Models =====
  
-We do not do AI (yet).  The GPU usage pattern is mostly one job per GPU for exclusive access.  So no NVlink requirements, CPI connections sufficient.  The application list is Amber, Gromacs, Lammps and some python biosequencing packages. Our current per GPU memory footprint is 8 GB which seems sufficient.+We do not do AI (yet).  Our GPU usage pattern is mostly one job per GPU for exclusive access.  So no NVlink requirements, CPI connections sufficient.  The application list is Amber, Gromacs, Lammps and some python biosequencing packages. Our current per GPU memory footprint is 8 GB which seems sufficient.
  
 ^          Quadro  ^^^^^  Tesla  ^^  Turing  ^    ^ ^          Quadro  ^^^^^  Tesla  ^^  Turing  ^    ^
Line 19: Line 19:
 |  FP64?  |  -  |  some  |  -  |  some  |  -  |  yes  |  yes  |  -  |double precision| |  FP64?  |  -  |  some  |  -  |  some  |  -  |  yes  |  yes  |  -  |double precision|
  
-A lot of information comes from this web site [[https://blog.exxactcorp.com/whats-the-best-gpu-for-deep-learning-rtx-2080-ti-vs-titan-rtx-vs-rtx-8000-vs-rtx-6000/|Best GPU for deep learning]]+A lot of information comes from this web site [[https://blog.exxactcorp.com/whats-the-best-gpu-for-deep-learning-rtx-2080-ti-vs-titan-rtx-vs-rtx-8000-vs-rtx-6000/|Best GPU for deep learning]].  Deep learning (training and inference) are driving the GPU models more towards single precision (FP32) or even half precision (FP16) to speed up training. Double precision models (the P100 and V100) are still available but there is a scientific drive towards mixed precision applications (FP64/FP32 or FP32/FP16 or even integer mixes).
  
-Bench statistics (Nidia GTX 1070 is about 100% baseline) from this web site [[https://gpu.userbenchmark.com/Faq/What-is-the-effective-GPU-speed-index/82|External Link]]+Bench statistics (Nvidia GTX 1070 is about 100% baseline) from this web site [[https://gpu.userbenchmark.com/Faq/What-is-the-effective-GPU-speed-index/82|External Link]]
  
 Most GPU models come in multiple memory configurations, showing the most common footprints. Most GPU models come in multiple memory configurations, showing the most common footprints.
Line 27: Line 27:
 This is a handy tool [[https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/|GPU Server Catalog]] This is a handy tool [[https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/|GPU Server Catalog]]
  
-Learn more about the T4 ... the T4 can run in mixed mode (fp32/fp16) and can deliver 65 Tflops. Other modes are INT8 at 130 Tops and INT4 260 Tops. Now at 65 Tflops mixed precision the cost dives to $34/tflop. Amazing. And the wattage is amazing too.+Learn more about the T4 ... the T4 can run in mixed mode (fp32/fp16) and can deliver 65 Tflops. Other modes are INT8 at 130 Tops and INT4 260 Tops. Now at 65 Tflops mixed precision the cost dives to $34/tflop. Amazing. And the wattage is amazing too. See the next page for the fp64/fp32 mixed precision mode quandary...[[cluster:182|P100 vs RTX 6000 & T4]]
  
   * [[https://www.nvidia.com/en-us/data-center/tesla-t4/|T4]]   * [[https://www.nvidia.com/en-us/data-center/tesla-t4/|T4]]
   * [[https://www.nvidia.com/en-us/data-center/products/enterprise-server/|External Link]]   * [[https://www.nvidia.com/en-us/data-center/products/enterprise-server/|External Link]]
-  * [[http://https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664|Fp32Fp16, INT8, INT4, Mixed Mode]]+  * [[http://https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664|FP32FP16, INT8, INT4, Mixed Mode]]
       * very interesting peak performance FP32 gpu chart (RTX TITAN and RTX 6000 on top)       * very interesting peak performance FP32 gpu chart (RTX TITAN and RTX 6000 on top)
     * [[https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#framework|Training Guide for Mixed Precision]]     * [[https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#framework|Training Guide for Mixed Precision]]
Line 43: Line 43:
  
   -  does Amber run on the T4, the web site lists "Turing (SM_75) based cards require CUDA 9.2 or later." but does not list the T4 (too new?).   -  does Amber run on the T4, the web site lists "Turing (SM_75) based cards require CUDA 9.2 or later." but does not list the T4 (too new?).
-  - Gaussian g16c01 AVX enabled linux binaries - no linda "Platforms marked with † include GPU support for NVIDIA K40, K80, //P100, and V100// boards with 12 GB of memory or higher. A version of NVIDIA drivers compatible with CUDA 8.0 or higher. We run CUDA 9.2, so ok, but OS platform 6.10 or 7.6 required? We're at 6.5 (n38-n45) or 7.5.10 (n33-n37, n78).+  - Gaussian g16c01 AVX enabled linux binaries - no linda "... include GPU support for NVIDIA K40, K80, //P100, and V100// boards with 12 GB of memory or higher. A version of NVIDIA drivers compatible with CUDA 8.0 or higher.We run CUDA 9.2, so ok, but OS platform 6.10 or 7.6 required? We're at 6.5 (n38-n45) or 7.5.10 (n33-n37, n78). Do not expect this to be a problem.
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/181.1565270702.txt.gz · Last modified: 2019/08/08 09:25 by hmeij07