User Tools

Site Tools


cluster:223

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:223 [2023/09/06 09:24]
hmeij07 [Test]
cluster:223 [2023/09/18 16:56]
hmeij07
Line 4: Line 4:
 ==== cuda toolkit ==== ==== cuda toolkit ====
  
-Upgrading to latest drivers and tooltkit that supports our oldest gpu model the K20m gpus found in nodes n33-n37 (queue mwgpu). Consult the page on previous K20m upgrade [[cluster:172|K20 Redo]]+Upgrading Cuda to latest drivers and tooltkit that supports our oldest gpu model the K20m gpus found in nodes n33-n37 (queue mwgpu). Consult the page on previous K20m upgrade [[cluster:172|K20 Redo]]
  
 For legacy hardware find the latest legacy driver here For legacy hardware find the latest legacy driver here
Line 47: Line 47:
  
 # update link to this version: yes # update link to this version: yes
 +# no -silent -driver ...
  
 =========== ===========
Line 65: Line 66:
  
  
-# no nvidia_modprobe? nope, don't know where drivers are not in /dev/nvidia+# no nvidia_modprobe? ls -l /dev/nvidia?
  
 reboot reboot
Line 115: Line 116:
  
  
-==== Test ====+ 
 +==== Testing ==== 
 + 
  
 Script ~hmeij/slurm/run.centos, cuda 11.2, pmemd.cuda of local install of amber20 with Script ~hmeij/slurm/run.centos, cuda 11.2, pmemd.cuda of local install of amber20 with
Line 124: Line 128:
   * #SBATCH --mem-per-gpu=7168   * #SBATCH --mem-per-gpu=7168
  
-For some reason this yields cpus=8 which is different behavior (expected cpu=1). Slurm is overriding the above settings with partition setting of DefCpuPerGPU=8. Slurm has not changed but cuda version has. Odd.+For some reason this yields cpus=8 which is different behavior (expected cpu=1). Slurm is overriding the above settings with partition setting of DefCpuPerGPU=8. Slurm has not changed but cuda version has. Odd. Good news is Amber runs fine, no need to recompile.
  
 <code> <code>
 +
 +# from slurmd.log
 +[2023-09-05T14:51:00.691] Gres Name=gpu Type=tesla_k20m Count=4
 +
 JOBID   PARTITION         NAME          USER  ST          TIME NODES  CPUS    MIN_MEMORY NODELIST(REASON) JOBID   PARTITION         NAME          USER  ST          TIME NODES  CPUS    MIN_MEMORY NODELIST(REASON)
 1053052 mwgpu             test         hmeij            0:09                                  n33 1053052 mwgpu             test         hmeij            0:09                                  n33
 +
 +[hmeij@cottontail2 slurm]$ ssh n33 gpu-info
 +id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem
 +0, Tesla K20m, 36, 95 MiB, 4648 MiB, 100 %, 25 %
 +1, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 %
 +2, Tesla K20m, 25, 0 MiB, 4743 MiB, 0 %, 0 %
 +3, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 %
 +
 +[hmeij@cottontail2 slurm]$ ssh n33 gpu-process
 +gpu_name, gpu_id, pid, process_name
 +Tesla K20m, 0, 28394, pmemd.cuda
 +
 </code> </code>
 +
 +  * #SBATCH --cpus-per-gpu=1
 +
 +Adding this does force Slurm to allocate just a single cpu. Now try 4 gpu jobs per node. No need for CUDA_VISIBLE_DEVICES setting.
 +
 +<code>
 +
 +JOBID   PARTITION         NAME          USER  ST          TIME NODES  CPUS    MIN_MEMORY NODELIST(REASON)
 +1053992 mwgpu             test         hmeij            0:04                                  n33
 +
 +[hmeij@cottontail2 slurm]$ for i in `seq 1 6`; do sbatch run.centos; sleep 30; squeue | grep hmeij; done
 +
 +# output
 +Submitted batch job 1054000
 +1054000 mwgpu             test         hmeij            0:30                                  n33
 +Submitted batch job 1054001
 +1054001 mwgpu             test         hmeij            0:30                                  n33
 +1054000 mwgpu             test         hmeij            1:00                                  n33
 +Submitted batch job 1054002
 +1054002 mwgpu             test         hmeij            0:30                                  n33
 +1054001 mwgpu             test         hmeij            1:00                                  n33
 +1054000 mwgpu             test         hmeij            1:30                                  n33
 +Submitted batch job 1054003
 +1054003 mwgpu             test         hmeij            0:30                                  n33
 +1054002 mwgpu             test         hmeij            1:00                                  n33
 +1054001 mwgpu             test         hmeij            1:30                                  n33
 +1054000 mwgpu             test         hmeij            2:00                                  n33
 +Submitted batch job 1054004
 +1054004 mwgpu             test         hmeij  PD          0:00                          (Resources)
 +1054003 mwgpu             test         hmeij            1:00                                  n33
 +1054002 mwgpu             test         hmeij            1:30                                  n33
 +1054001 mwgpu             test         hmeij            2:00                                  n33
 +1054000 mwgpu             test         hmeij            2:30                                  n33
 +Submitted batch job 1054005
 +1054005 mwgpu             test         hmeij  PD          0:00                     0(Nodes required f
 +1054004 mwgpu             test         hmeij  PD          0:00                          (Resources)
 +1054003 mwgpu             test         hmeij            1:30                                  n33
 +1054002 mwgpu             test         hmeij            2:00                                  n33
 +1054001 mwgpu             test         hmeij            2:30                                  n33
 +1054000 mwgpu             test         hmeij            3:00                                  n33
 +
 +
 +[hmeij@cottontail2 slurm]$ ssh n33 gpu-info
 +id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem
 +0, Tesla K20m, 40, 95 MiB, 4648 MiB, 100 %, 25 %
 +1, Tesla K20m, 40, 95 MiB, 4648 MiB, 94 %, 23 %
 +2, Tesla K20m, 35, 95 MiB, 4648 MiB, 93 %, 21 %
 +3, Tesla K20m, 28, 95 MiB, 4648 MiB, 97 %, 25 %
 +
 +</code>
 +
 +Other software does need to be recompiled as it links to specific version of libraries rather than the generic libName.so (lammps).
 +
 +Script ~hmeij/slurm/run.centos.lammps, setup env, get help page.
 +
 +<code>
 +
 +/share/apps/intel/parallel_studio_xe_2016_update3/compilers_and_libraries_2016.3.210/linux/bin/intel64/ifort
 +/usr/local/cuda/bin/nvcc
 +/share/apps/CENTOS7/openmpi/4.0.4/bin/mpirun
 +/share/apps/CENTOS7/python/3.8.3/bin/python
 +/share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/lmp_mpi-cuda-single-single
 +        linux-vdso.so.1 =>  (0x00007ffd714ec000)
 +        libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007fe443b9a000)
 +        libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007fe44390b000)
 +        libcuda.so.1 => /lib64/libcuda.so.1 (0x00007fe442223000)
 +        libcufft.so.10 => /usr/local/cuda/lib64/libcufft.so.10 (0x00007fe436a74000)
 +        libdl.so.2 => /lib64/libdl.so.2 (0x00007fe436870000)
 +        libmpi.so.40 => /share/apps/CENTOS7/openmpi/4.0.4/lib/libmpi.so.40 (0x00007fe43655b000)
 +        libstdc++.so.6 => /share/apps/CENTOS7/gcc/6.5.0/lib64/libstdc++.so.6 (0x00007fe4361d9000)
 +        libm.so.6 => /lib64/libm.so.6 (0x00007fe435ed7000)
 +        libgcc_s.so.1 => /share/apps/CENTOS7/gcc/6.5.0/lib64/libgcc_s.so.1 (0x00007fe435cc0000)
 +        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe435aa4000)
 +        libc.so.6 => /lib64/libc.so.6 (0x00007fe4356d6000)
 +        /lib64/ld-linux-x86-64.so.2 (0x00007fe443def000)
 +        librt.so.1 => /lib64/librt.so.1 (0x00007fe4354ce000)
 +        libopen-rte.so.40 => /share/apps/CENTOS7/openmpi/4.0.4/lib/libopen-rte.so.40 (0x00007fe435218000)
 +        libopen-pal.so.40 => /share/apps/CENTOS7/openmpi/4.0.4/lib/libopen-pal.so.40 (0x00007fe434f09000)
 +        libutil.so.1 => /lib64/libutil.so.1 (0x00007fe434d06000)
 +        libz.so.1 => /lib64/libz.so.1 (0x00007fe434af0000)
 +
 +Large-scale Atomic/Molecular Massively Parallel Simulator - 28 Mar 2023 - Development
 +
 +Usage example: lmp_mpi-cuda-single-single -var t 300 -echo screen -in in.alloy
 +
 +List of command line options supported by this LAMMPS executable:
 +<snip>
 +
 +# hmmm, using -suffix gpu it does not jump on gpus, generic non-gpu libthread error
 +# same version rocky8/cuda-11.6 works, centos7/cuda-10.2 works, all "make" compiles
 +# try "cmake" compile on n33-n36 
 +# libspace tarball download fails on file hash and 
 +# yields a  status: [1;"Unsupported protocol" error for ML-PACE
 +
 +# without ML-SPACE hash fails for opencl-loarder third partty, bad url
 +# https://download.lammps.org/thirdparty/opencl-loader-opencl-loadewer-version...tgz
 +# then extract in _deps/ dir
 +# and added -D GPU_LIBRARY=../lib/gpu/libgpu.a ala QUIP_LIBRARY
 +# that works, cmake compile binary jumps on multiple gpus
 +
 +
 +[hmeij@n35 sharptail]$ mpirun -n 2 \
 +/share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp \
 +-suffix gpu -in in.colloid 
 +
 +[root@greentail52 ~]# ssh n35 gpu-process
 +gpu_name, gpu_id, pid, process_name
 +Tesla K20m, 0, 9911, /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp
 +Tesla K20m, 1, 9912, /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp
 +
 +# some stats, colloid example
 +
 +1 cpu, 1 gpu
 +Total wall time: 0:05:49
 +2 cpus, 2 gpus
 +Total wall time: 0:03:58
 +4 cpus, 4 gpus
 +Total wall time: 0:02:23
 +8 cpus, 4 gpus
 +Total wall time: 0:02:23
 +
 +# but the ML-PACE hash  error is different, so no go there 
 +
 +</code>
 +
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/223.txt · Last modified: 2023/09/18 16:56 by hmeij07