\\ **[[cluster:0|Back]]** ==== cuda toolkit ==== Upgrading Cuda to latest drivers and tooltkit that supports our oldest gpu model the K20m gpus found in nodes n33-n37 (queue mwgpu). Consult the page on previous K20m upgrade [[cluster:172|K20 Redo]] For legacy hardware find the latest legacy driver here * https://www.nvidia.com/en-us/drivers/unix/legacy-gpu/ Then download the driver series selected here * https://www.nvidia.com/en-us/drivers/unix/ Then download the latest toolkit supported (11.2) for K20m on centos7 * https://developer.nvidia.com/cuda-11.2.1-download-archive?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=runfilelocal # install drivers (uninstalls existing drivers, accept defaults) cd /usr/local/src DRIVER_VERSION=470.199.02 curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run sh ./NVIDIA-Linux-x86_64-470.199.02.run # install toolkit wget https://developer.download.nvidia.com/compute/cuda/11.2.1/local_installers/cuda_11.2.1_460.32.03_linux.run sh cuda_11.2.1_460.32.03_linux.run # prompts │ CUDA Installer │ │ - [X] Driver │ │ [X] 460.32.03 │ │ + [X] CUDA Toolkit 11.2 │ │ [ ] CUDA Samples 11.2 │ │ [ ] CUDA Demo Suite 11.2 │ │ [ ] CUDA Documentation 11.2 │ │ Options │ │ Install # update link to this version: yes # no -silent -driver ... =========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-11.2/ Samples: Not Selected Please make sure that - PATH includes /usr/local/cuda-11.2/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log # no nvidia_modprobe? ls -l /dev/nvidia? reboot # then export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH [hmeij@n33 ~]$ nvidia-smi Tue Sep 5 14:43:15 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla K20m On | 00000000:02:00.0 Off | 0 | | N/A 26C P8 25W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m On | 00000000:03:00.0 Off | 0 | | N/A 27C P8 26W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla K20m On | 00000000:83:00.0 Off | 0 | | N/A 25C P8 24W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla K20m On | 00000000:84:00.0 Off | 0 | | N/A 26C P8 25W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ # startup slurm, finds gpus? yes # old compiled code compatible? test ==== Testing ==== Script ~hmeij/slurm/run.centos, cuda 11.2, pmemd.cuda of local install of amber20 with * #SBATCH -N 1 * #SBATCH -n 1 * #SBATCH -B 1:1:1 * #SBATCH --mem-per-gpu=7168 For some reason this yields cpus=8 which is different behavior (expected cpu=1). Slurm is overriding the above settings with partition setting of DefCpuPerGPU=8. Slurm has not changed but cuda version has. Odd. Good news is Amber runs fine, no need to recompile. # from slurmd.log [2023-09-05T14:51:00.691] Gres Name=gpu Type=tesla_k20m Count=4 JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_MEMORY NODELIST(REASON) 1053052 mwgpu test hmeij R 0:09 1 8 0 n33 [hmeij@cottontail2 slurm]$ ssh n33 gpu-info id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem 0, Tesla K20m, 36, 95 MiB, 4648 MiB, 100 %, 25 % 1, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 % 2, Tesla K20m, 25, 0 MiB, 4743 MiB, 0 %, 0 % 3, Tesla K20m, 26, 0 MiB, 4743 MiB, 0 %, 0 % [hmeij@cottontail2 slurm]$ ssh n33 gpu-process gpu_name, gpu_id, pid, process_name Tesla K20m, 0, 28394, pmemd.cuda * #SBATCH --cpus-per-gpu=1 Adding this does force Slurm to allocate just a single cpu. Now try 4 gpu jobs per node. No need for CUDA_VISIBLE_DEVICES setting. JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_MEMORY NODELIST(REASON) 1053992 mwgpu test hmeij R 0:04 1 1 0 n33 [hmeij@cottontail2 slurm]$ for i in `seq 1 6`; do sbatch run.centos; sleep 30; squeue | grep hmeij; done # output Submitted batch job 1054000 1054000 mwgpu test hmeij R 0:30 1 1 0 n33 Submitted batch job 1054001 1054001 mwgpu test hmeij R 0:30 1 1 0 n33 1054000 mwgpu test hmeij R 1:00 1 1 0 n33 Submitted batch job 1054002 1054002 mwgpu test hmeij R 0:30 1 1 0 n33 1054001 mwgpu test hmeij R 1:00 1 1 0 n33 1054000 mwgpu test hmeij R 1:30 1 1 0 n33 Submitted batch job 1054003 1054003 mwgpu test hmeij R 0:30 1 1 0 n33 1054002 mwgpu test hmeij R 1:00 1 1 0 n33 1054001 mwgpu test hmeij R 1:30 1 1 0 n33 1054000 mwgpu test hmeij R 2:00 1 1 0 n33 Submitted batch job 1054004 1054004 mwgpu test hmeij PD 0:00 1 1 0 (Resources) 1054003 mwgpu test hmeij R 1:00 1 1 0 n33 1054002 mwgpu test hmeij R 1:30 1 1 0 n33 1054001 mwgpu test hmeij R 2:00 1 1 0 n33 1054000 mwgpu test hmeij R 2:30 1 1 0 n33 Submitted batch job 1054005 1054005 mwgpu test hmeij PD 0:00 1 1 0(Nodes required f 1054004 mwgpu test hmeij PD 0:00 1 1 0 (Resources) 1054003 mwgpu test hmeij R 1:30 1 1 0 n33 1054002 mwgpu test hmeij R 2:00 1 1 0 n33 1054001 mwgpu test hmeij R 2:30 1 1 0 n33 1054000 mwgpu test hmeij R 3:00 1 1 0 n33 [hmeij@cottontail2 slurm]$ ssh n33 gpu-info id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem 0, Tesla K20m, 40, 95 MiB, 4648 MiB, 100 %, 25 % 1, Tesla K20m, 40, 95 MiB, 4648 MiB, 94 %, 23 % 2, Tesla K20m, 35, 95 MiB, 4648 MiB, 93 %, 21 % 3, Tesla K20m, 28, 95 MiB, 4648 MiB, 97 %, 25 % Other software does need to be recompiled as it links to specific version of libraries rather than the generic libName.so (lammps). Script ~hmeij/slurm/run.centos.lammps, setup env, get help page. /share/apps/intel/parallel_studio_xe_2016_update3/compilers_and_libraries_2016.3.210/linux/bin/intel64/ifort /usr/local/cuda/bin/nvcc /share/apps/CENTOS7/openmpi/4.0.4/bin/mpirun /share/apps/CENTOS7/python/3.8.3/bin/python /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/lmp_mpi-cuda-single-single linux-vdso.so.1 => (0x00007ffd714ec000) libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007fe443b9a000) libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007fe44390b000) libcuda.so.1 => /lib64/libcuda.so.1 (0x00007fe442223000) libcufft.so.10 => /usr/local/cuda/lib64/libcufft.so.10 (0x00007fe436a74000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe436870000) libmpi.so.40 => /share/apps/CENTOS7/openmpi/4.0.4/lib/libmpi.so.40 (0x00007fe43655b000) libstdc++.so.6 => /share/apps/CENTOS7/gcc/6.5.0/lib64/libstdc++.so.6 (0x00007fe4361d9000) libm.so.6 => /lib64/libm.so.6 (0x00007fe435ed7000) libgcc_s.so.1 => /share/apps/CENTOS7/gcc/6.5.0/lib64/libgcc_s.so.1 (0x00007fe435cc0000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe435aa4000) libc.so.6 => /lib64/libc.so.6 (0x00007fe4356d6000) /lib64/ld-linux-x86-64.so.2 (0x00007fe443def000) librt.so.1 => /lib64/librt.so.1 (0x00007fe4354ce000) libopen-rte.so.40 => /share/apps/CENTOS7/openmpi/4.0.4/lib/libopen-rte.so.40 (0x00007fe435218000) libopen-pal.so.40 => /share/apps/CENTOS7/openmpi/4.0.4/lib/libopen-pal.so.40 (0x00007fe434f09000) libutil.so.1 => /lib64/libutil.so.1 (0x00007fe434d06000) libz.so.1 => /lib64/libz.so.1 (0x00007fe434af0000) Large-scale Atomic/Molecular Massively Parallel Simulator - 28 Mar 2023 - Development Usage example: lmp_mpi-cuda-single-single -var t 300 -echo screen -in in.alloy List of command line options supported by this LAMMPS executable: # hmmm, using -suffix gpu it does not jump on gpus, generic non-gpu libthread error # same version rocky8/cuda-11.6 works, centos7/cuda-10.2 works, all "make" compiles # try "cmake" compile on n33-n36 # libspace tarball download fails on file hash and # yields a status: [1;"Unsupported protocol"] error for ML-PACE # without ML-SPACE hash fails for opencl-loarder third partty, bad url # https://download.lammps.org/thirdparty/opencl-loader-opencl-loadewer-version...tgz # then extract in _deps/ dir # and added -D GPU_LIBRARY=../lib/gpu/libgpu.a ala QUIP_LIBRARY # that works, cmake compile binary jumps on multiple gpus [hmeij@n35 sharptail]$ mpirun -n 2 \ /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp \ -suffix gpu -in in.colloid [root@greentail52 ~]# ssh n35 gpu-process gpu_name, gpu_id, pid, process_name Tesla K20m, 0, 9911, /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp Tesla K20m, 1, 9912, /share/apps/CENTOS7/lammps/25Apr2023/cuda-11.2/cmake/single-single/lmp # some stats, colloid example 1 cpu, 1 gpu Total wall time: 0:05:49 2 cpus, 2 gpus Total wall time: 0:03:58 4 cpus, 4 gpus Total wall time: 0:02:23 8 cpus, 4 gpus Total wall time: 0:02:23 # but the ML-PACE hash error is different, so no go there \\ **[[cluster:0|Back]]**