We do not do AI (yet). Our GPU usage pattern is mostly one job per GPU for exclusive access. So no NVlink requirements, CPI connections sufficient. The application list is Amber, Gromacs, Lammps and some python biosequencing packages. Our current per GPU memory footprint is 8 GB which seems sufficient.
Quadro | Tesla | Turing | |||||||
---|---|---|---|---|---|---|---|---|---|
Model | RTX 2080 Ti | RTX TITAN | RTX 4000 | RTX 6000 | RTX 8000 | P100 | V100 | T4 | Notes |
Cores | 4352 | 4608 | 2304 | 4608 | 4608 | 3584 | 5120 | 2560 | parallel cuda |
Memory | 11 | 24 | 8 | 24 | 46 | 12 | 32 | 16 | GB ddr6 |
Watts | 250 | 280 | 250 | 295 | 295 | 250 | 250 | 70 ! | |
Tflops | - | 0.5 | - | 0.5 | - | 4.7 | 7 | - | double fp64 |
Tflops | 13.5 | 16 | 7 | 16 | 16 | 9.3 | 14 | 8.1 | single fp32 |
Avg Bench | 197% | 215% | 120% | 207% | 219% | 120% | 150% | ?? | user bench |
Price | $1,199 | $2,499 | $900 | $4,000 | $5,500 | $4,250 | $9,538 | $2,200 | list price |
$/fp32 | $89 | $156 | $129 | $250 | $344 | $457 | $681 | $272 | |
Notes | small scale | medium scale | small scale | medium scale | large scale | versatile but EOL | most advanced | supercharge | |
FP64? | - | some | - | some | - | yes | yes | - | double precision |
A lot of information comes from this web site Best GPU for deep learning. Deep learning (training and inference) are driving the GPU models more towards single precision (FP32) or even half precision (FP16) to speed up training. Double precision models (the P100 and V100) are still available but there is a scientific drive towards mixed precision applications (FP64/FP32 or FP32/FP16 or even integer mixes).
Bench statistics (Nvidia GTX 1070 is about 100% baseline) from this web site External Link
Most GPU models come in multiple memory configurations, showing the most common footprints.
This is a handy tool GPU Server Catalog
Learn more about the T4 … the T4 can run in mixed mode (fp32/fp16) and can deliver 65 Tflops. Other modes are INT8 at 130 Tops and INT4 260 Tops. Now at 65 Tflops mixed precision the cost dives to $34/tflop. Amazing. And the wattage is amazing too. See the next page for the fp64/fp32 mixed precision mode quandary…P100 vs RTX 6000 & T4
From Lammps developer: “Computing forces in all single precision is a significant approximation and mostly works ok in homogeneous system, where there is a lot of error cancellation. Using half precision in any form for force computations is not advisable.”
From Gromacs web site: “GROMACS simulations are normally run in “mixed” floating-point precision, which is suited for the use of single precision in FFTW. The default FFTW package is normally in double precision.”
Keep track of these