Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
cluster:181 [2019/08/02 14:35] hmeij07 [2019 GPU Models] |
cluster:181 [2019/08/12 14:16] hmeij07 |
A lot of information comes from this web site [[https://blog.exxactcorp.com/whats-the-best-gpu-for-deep-learning-rtx-2080-ti-vs-titan-rtx-vs-rtx-8000-vs-rtx-6000/|Best GPU for deep learning]] | A lot of information comes from this web site [[https://blog.exxactcorp.com/whats-the-best-gpu-for-deep-learning-rtx-2080-ti-vs-titan-rtx-vs-rtx-8000-vs-rtx-6000/|Best GPU for deep learning]] |
| |
Bench statistics (Nidia GTX 1070 is about 100% baseline) from this web site [[https://gpu.userbenchmark.com/Faq/What-is-the-effective-GPU-speed-index/82|External Link]] | Bench statistics (Nvidia GTX 1070 is about 100% baseline) from this web site [[https://gpu.userbenchmark.com/Faq/What-is-the-effective-GPU-speed-index/82|External Link]] |
| |
Most GPU models come in multiple memory configurations, showing the most common footprints. | Most GPU models come in multiple memory configurations, showing the most common footprints. |
This is a handy tool [[https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/|GPU Server Catalog]] | This is a handy tool [[https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/|GPU Server Catalog]] |
| |
Learn more about the T4 ... the T4 can run in mixed mode (fp32/fp16) and can deliver 65 Tflops. Other modes are INT8 at 130 Tops and INT4 260 Tops. Now at 65 Tflops mixed precision the cost dives to $34/tflop. Amazing. And the wattage is amazing too. | Learn more about the T4 ... the T4 can run in mixed mode (fp32/fp16) and can deliver 65 Tflops. Other modes are INT8 at 130 Tops and INT4 260 Tops. Now at 65 Tflops mixed precision the cost dives to $34/tflop. Amazing. And the wattage is amazing too. See the next page for the fp64/fp32 mixed precision mode quandary...[[cluster:182|P100 vs RTX 6000 & T4]] |
| |
* [[https://www.nvidia.com/en-us/data-center/tesla-t4/|T4]] | * [[https://www.nvidia.com/en-us/data-center/tesla-t4/|T4]] |
* [[https://www.nvidia.com/en-us/data-center/products/enterprise-server/|External Link]] | * [[https://www.nvidia.com/en-us/data-center/products/enterprise-server/|External Link]] |
* [[http://https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664|Fp32, Fp16, INT8, INT4, Mixed Mode]] | * [[http://https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664|FP32, FP16, INT8, INT4, Mixed Mode]] |
| * very interesting peak performance FP32 gpu chart (RTX TITAN and RTX 6000 on top) |
| * [[https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#framework|Training Guide for Mixed Precision]] |
| |
| From Lammps developer: "Computing forces in all single precision is a significant approximation and mostly works ok in homogeneous system, where there is a lot of error cancellation. Using half precision in any form for force computations is not advisable." |
| |
| From Gromacs web site: "GROMACS simulations are normally run in “mixed” floating-point precision, which is suited for the use of single precision in FFTW. The default FFTW package is normally in double precision." |
| |
| |
| **Keep track of these** |
| |
| - does Amber run on the T4, the web site lists "Turing (SM_75) based cards require CUDA 9.2 or later." but does not list the T4 (too new?). |
| - Gaussian g16c01 AVX enabled linux binaries - no linda "Platforms marked with † include GPU support for NVIDIA K40, K80, //P100, and V100// boards with 12 GB of memory or higher. A version of NVIDIA drivers compatible with CUDA 8.0 or higher. We run CUDA 9.2, so ok, but OS platform 6.10 or 7.6 required? We're at 6.5 (n38-n45) or 7.5.10 (n33-n37, n78). |
| |
Keep track of this; does Amber run on the T4, the web site lists "Turing (SM_75) based cards require CUDA 9.2 or later." but does not list the T4 (too new?). | |
| |
\\ | \\ |
**[[cluster:0|Back]]** | **[[cluster:0|Back]]** |