This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:223 [2023/09/07 20:03] hmeij07 |
cluster:223 [2023/09/18 20:56] (current) hmeij07 |
||
---|---|---|---|
Line 116: | Line 116: | ||
- | ==== Test ==== | ||
- | |||
- | Script ~hmeij/ | ||
- | |||
- | * #SBATCH -N 1 | ||
- | * #SBATCH -n 1 | ||
- | * #SBATCH -B 1:1:1 | ||
- | * #SBATCH --mem-per-gpu=7168 | ||
- | |||
- | For some reason this yields cpus=8 which is different behavior (expected cpu=1). Slurm is overriding the above settings with partition setting of DefCpuPerGPU=8. Slurm has not changed but cuda version has. Odd. | ||
- | |||
- | < | ||
- | |||
- | # from slurmd.log | ||
- | [2023-09-05T14: | ||
- | |||
- | JOBID | ||
- | 1053052 mwgpu | ||
- | |||
- | [hmeij@cottontail2 slurm]$ ssh n33 gpu-info | ||
- | id, | ||
- | 0, Tesla K20m, 36, 95 MiB, 4648 MiB, 100 %, 25 % | ||
- | 1, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 % | ||
- | 2, Tesla K20m, 25, 0 MiB, 4743 MiB, 0 %, 0 % | ||
- | 3, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 % | ||
- | |||
- | [hmeij@cottontail2 slurm]$ ssh n33 gpu-process | ||
- | gpu_name, gpu_id, pid, process_name | ||
- | Tesla K20m, 0, 28394, pmemd.cuda | ||
- | |||
- | </ | ||
==== Testing ==== | ==== Testing ==== | ||
Line 267: | Line 236: | ||
List of command line options supported by this LAMMPS executable: | List of command line options supported by this LAMMPS executable: | ||
+ | < | ||
+ | |||
+ | # hmmm, using -suffix gpu it does not jump on gpus, generic non-gpu libthread error | ||
+ | # same version rocky8/ | ||
+ | # try " | ||
+ | # libspace tarball download fails on file hash and | ||
+ | # yields a status: [1;" | ||
+ | |||
+ | # without ML-SPACE hash fails for opencl-loarder third partty, bad url | ||
+ | # https:// | ||
+ | # then extract in _deps/ dir | ||
+ | # and added -D GPU_LIBRARY=../ | ||
+ | # that works, cmake compile binary jumps on multiple gpus | ||
+ | |||
+ | |||
+ | [hmeij@n35 sharptail]$ mpirun -n 2 \ | ||
+ | / | ||
+ | -suffix gpu -in in.colloid | ||
+ | |||
+ | [root@greentail52 ~]# ssh n35 gpu-process | ||
+ | gpu_name, gpu_id, pid, process_name | ||
+ | Tesla K20m, 0, 9911, / | ||
+ | Tesla K20m, 1, 9912, / | ||
+ | |||
+ | # some stats, colloid example | ||
+ | |||
+ | 1 cpu, 1 gpu | ||
+ | Total wall time: 0:05:49 | ||
+ | 2 cpus, 2 gpus | ||
+ | Total wall time: 0:03:58 | ||
+ | 4 cpus, 4 gpus | ||
+ | Total wall time: 0:02:23 | ||
+ | 8 cpus, 4 gpus | ||
+ | Total wall time: 0:02:23 | ||
+ | |||
+ | # but the ML-PACE hash error is different, so no go there | ||
</ | </ |