This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:175 [2018/09/22 18:28] hmeij07 [P100 vs GTX & K20] |
cluster:175 [2018/09/26 20:11] hmeij07 [Gromacs 2018.3] |
||
---|---|---|---|
Line 9: | Line 9: | ||
| mem | 12/16 | 11 | 5 | gb | | | mem | 12/16 | 11 | 5 | gb | | ||
| ghz | 2.6 | 1.6 | 0.7 | speed | | | ghz | 2.6 | 1.6 | 0.7 | speed | | ||
- | | flops | 4.7 | 0.355 | 1.15 | dpfp | | + | | flops | 4.7/5.3 |
Comparing these GPUs yields the following results presented below. These are not " | Comparing these GPUs yields the following results presented below. These are not " | ||
Line 36: | Line 36: | ||
</ | </ | ||
+ | |||
+ | Look at these gpu temperatures, | ||
==== Lammps ==== | ==== Lammps ==== | ||
Line 67: | Line 69: | ||
</ | </ | ||
+ | |||
+ | ==== Lammps (PMMA) ==== | ||
+ | |||
+ | Using a material called PMMA (https:// | ||
+ | |||
+ | ^ gpu ^ cpus ^ ns/ | ||
+ | | 1 P100 | 4 | 89 | x4 | 356 | | ||
+ | | 1 GTX | 6 | 90 | x4 | 360 | | ||
+ | | 1 K20 | 6 | 47 | x4 | 188 | | ||
+ | |||
+ | That means the P100 works as well as the GTX. The K20 works at 50% the performance level of the others which is impressive for this old gpu. | ||
+ | |||
+ | |||
+ | |||
==== Gromacs ==== | ==== Gromacs ==== | ||
Line 72: | Line 88: | ||
Gromacs has shown vastly improved performance between versions. v5 delivered about 20 ns/day per K20 server and 350 ns/day on GTX server. v2018 delivered 75 ns/day per K20 server and 900 ns/day on GTX server. A roughly 3x improvement. | Gromacs has shown vastly improved performance between versions. v5 delivered about 20 ns/day per K20 server and 350 ns/day on GTX server. v2018 delivered 75 ns/day per K20 server and 900 ns/day on GTX server. A roughly 3x improvement. | ||
- | On the P100 test node, I could not invoke the multidir option of gromacs (have run it on GTX, weird). The utilization of the gpu drops as more and more gpus are deployed. | + | On the P100 test node, I could not invoke the multidir option of gromacs (have run it on GTX, weird). The utilization of the gpu drops as more and more gpus are deployed. |
< | < | ||
Line 81: | Line 97: | ||
localhost, | localhost, | ||
localhost, | localhost, | ||
- | gmx_mpi mdrun -gpu_id 0123 -ntmpi 0 \ | + | gmx_mpi mdrun -gpu_id 0123 -ntmpi |
-s topol.tpr -ntomp 4 -npme 1 -nsteps 20000 -pin on -nb gpu | -s topol.tpr -ntomp 4 -npme 1 -nsteps 20000 -pin on -nb gpu | ||
Line 102: | Line 118: | ||
</ | </ | ||
+ | |||
+ | ==== Gromacs 2018.3 ==== | ||
+ | |||
+ | The multidir not running in Gromacs 2018 is a bug in the code clashing with the call MPI_Barrier (communication timing error). | ||
+ | |||
+ | < | ||
+ | |||
+ | # multidir -gpu_id 0123 with four simultaneous gromacs processes | ||
+ | |||
+ | -np 8 -ntomp | ||
+ | 01/ | ||
+ | 02/ | ||
+ | 03/ | ||
+ | 04/ | ||
+ | |||
+ | -np 16 -ntomp | ||
+ | 01/ | ||
+ | 02/ | ||
+ | 03/ | ||
+ | 04/ | ||
+ | |||
+ | multidir -gpu_id 00112233 with eight simultaneous gromacs processes | ||
+ | sharing the gpus, 2 processes per gpu | ||
+ | |||
+ | -np 8 -ntomp | ||
+ | Error in user input: | ||
+ | The string of available GPU device IDs ' | ||
+ | device IDs | ||
+ | |||
+ | </ | ||
+ | |||
+ | That last error when loading multiple processes per gpu is *not* according to their documentation. So the multidir performance is similar to previous single dir performance but still lags GTX performance by quite a bit. Albeit, there is room in utilization rate of the gpus. | ||
+ | |||
==== What to Buy ==== | ==== What to Buy ==== |