User Tools

Site Tools


cluster:223

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:223 [2023/09/07 20:03]
hmeij07
cluster:223 [2023/09/18 20:56] (current)
hmeij07
Line 116: Line 116:
  
  
-==== Test ==== 
- 
-Script ~hmeij/slurm/run.centos, cuda 11.2, pmemd.cuda of local install of amber20 with 
- 
-  * #SBATCH -N 1 
-  * #SBATCH -n 1 
-  * #SBATCH -B 1:1:1 
-  * #SBATCH --mem-per-gpu=7168 
- 
-For some reason this yields cpus=8 which is different behavior (expected cpu=1). Slurm is overriding the above settings with partition setting of DefCpuPerGPU=8. Slurm has not changed but cuda version has. Odd. 
- 
-<code> 
- 
-# from slurmd.log 
-[2023-09-05T14:51:00.691] Gres Name=gpu Type=tesla_k20m Count=4 
- 
-JOBID   PARTITION         NAME          USER  ST          TIME NODES  CPUS    MIN_MEMORY NODELIST(REASON) 
-1053052 mwgpu             test         hmeij            0:09                                  n33 
- 
-[hmeij@cottontail2 slurm]$ ssh n33 gpu-info 
-id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem 
-0, Tesla K20m, 36, 95 MiB, 4648 MiB, 100 %, 25 % 
-1, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 % 
-2, Tesla K20m, 25, 0 MiB, 4743 MiB, 0 %, 0 % 
-3, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 % 
- 
-[hmeij@cottontail2 slurm]$ ssh n33 gpu-process 
-gpu_name, gpu_id, pid, process_name 
-Tesla K20m, 0, 28394, pmemd.cuda 
- 
-</code> 
  
 ==== Testing ==== ==== Testing ====
Line 267: Line 236:
  
 List of command line options supported by this LAMMPS executable: List of command line options supported by this LAMMPS executable:
 +<snip>
 +
 +# hmmm, using -suffix gpu it does not jump on gpus, generic non-gpu libthread error
 +# same version rocky8/cuda-11.6 works, centos7/cuda-10.2 works, all "make" compiles
 +# try "cmake" compile on n33-n36 
 +# libspace tarball download fails on file hash and 
 +# yields a  status: [1;"Unsupported protocol" error for ML-PACE
 +
 +# without ML-SPACE hash fails for opencl-loarder third partty, bad url
 +# https://download.lammps.org/thirdparty/opencl-loader-opencl-loadewer-version...tgz
 +# then extract in _deps/ dir
 +# and added -D GPU_LIBRARY=../lib/gpu/libgpu.a ala QUIP_LIBRARY
 +# that works, cmake compile binary jumps on multiple gpus
 +
 +
 +[hmeij@n35 sharptail]$ mpirun -n 2 \
 +/share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp \
 +-suffix gpu -in in.colloid 
 +
 +[root@greentail52 ~]# ssh n35 gpu-process
 +gpu_name, gpu_id, pid, process_name
 +Tesla K20m, 0, 9911, /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp
 +Tesla K20m, 1, 9912, /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp
 +
 +# some stats, colloid example
 +
 +1 cpu, 1 gpu
 +Total wall time: 0:05:49
 +2 cpus, 2 gpus
 +Total wall time: 0:03:58
 +4 cpus, 4 gpus
 +Total wall time: 0:02:23
 +8 cpus, 4 gpus
 +Total wall time: 0:02:23
 +
 +# but the ML-PACE hash  error is different, so no go there 
  
 </code> </code>
cluster/223.1694116998.txt.gz ยท Last modified: 2023/09/07 20:03 by hmeij07