This is an old revision of the document!
Upgrading to latest drivers and tooltkit that supports our oldest gpu model the K20m gpus found in nodes n33-n37 (queue mwgpu). Consult the page on previous K20m upgrade K20 Redo
For legacy hardware find the latest legacy driver here
Then download the driver series selected here
Then download the latest toolkit supported (11.2) for K20m on centos7
# install drivers (uninstalls existing drivers, accept defaults) cd /usr/local/src DRIVER_VERSION=470.199.02 curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run sh ./NVIDIA-Linux-x86_64-470.199.02.run # install toolkit wget https://developer.download.nvidia.com/compute/cuda/11.2.1/local_installers/cuda_11.2.1_460.32.03_linux.run sh cuda_11.2.1_460.32.03_linux.run # prompts │ CUDA Installer │ │ - [X] Driver │ │ [X] 460.32.03 │ │ + [X] CUDA Toolkit 11.2 │ │ [ ] CUDA Samples 11.2 │ │ [ ] CUDA Demo Suite 11.2 │ │ [ ] CUDA Documentation 11.2 │ │ Options │ │ Install # update link to this version: yes =========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-11.2/ Samples: Not Selected Please make sure that - PATH includes /usr/local/cuda-11.2/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log # no nvidia_modprobe? nope, don't know where drivers are not in /dev/nvidia reboot # then export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH [hmeij@n33 ~]$ nvidia-smi Tue Sep 5 14:43:15 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla K20m On | 00000000:02:00.0 Off | 0 | | N/A 26C P8 25W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m On | 00000000:03:00.0 Off | 0 | | N/A 27C P8 26W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla K20m On | 00000000:83:00.0 Off | 0 | | N/A 25C P8 24W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla K20m On | 00000000:84:00.0 Off | 0 | | N/A 26C P8 25W / 225W | 0MiB / 4743MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ # startup slurm, finds gpus? yes # old compiled code compatible? test
Script ~hmeij/slurm/run.centos, cuda 11.2, pmemd.cuda of amber20 with