User Tools

Site Tools


cluster:172

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:172 [2018/09/20 12:32]
hmeij07 [Lammps]
cluster:172 [2018/09/25 15:17]
hmeij07 [Finish]
Line 37: Line 37:
 # download runfiles from https://developer.nvidia.com/cuda-downloads # download runfiles from https://developer.nvidia.com/cuda-downloads
 # files in /usr/local/src # files in /usr/local/src
-sh cuda_name_of_runfile +sh cuda_9.2.148_396.37_linux.run 
-sh cuda_name_of_runfile_patch+
  
 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26? Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26?
Line 55: Line 55:
 (y)es/(n)o/(q)uit: n (y)es/(n)o/(q)uit: n
  
-# /etc/modprobe.d/blacklist-nouveau.conf+# /etc/modprobe.d/blacklist-nouveau.conf (new file by nvidia)
 # reboot before driver installation # CHROOT done # reboot before driver installation # CHROOT done
 blacklist nouveau blacklist nouveau
Line 62: Line 62:
  
 # nvidia driver # nvidia driver
-./cuda_name_of_runfile -silent -driver+./cuda_name_of_runfile \-\-silent \-\-accept-eula driver
  
 # backup # backup
Line 319: Line 319:
   * consulting the ARCH web page I choose -arch=sm_35 (on n37 for K20)   * consulting the ARCH web page I choose -arch=sm_35 (on n37 for K20)
  
-Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals.  Please double check results against previous runs. Compile as regular user and stage lmp_mpi in /usr/local/lammps-22Aug10/+Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals.  Please double check results against previous runs. Compile as regular user and stage lmp_mpi in /usr/local/lammps-22Aug18/
  
 <code> <code>
Line 325: Line 325:
 [hmeij@n37 src]$ ll /usr/local/lammps-22Aug18/ [hmeij@n37 src]$ ll /usr/local/lammps-22Aug18/
 total 104356 total 104356
--rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-cuda +-rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-gpu 
--rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-cuda +-rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-gpu 
--rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-cuda+-rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-gpu
  
 </code> </code>
Line 367: Line 367:
   * Make the final tar file for /usr/local and post with CHROOT # done   * Make the final tar file for /usr/local and post with CHROOT # done
   * Install all the packages of this page in CHROOT # marked done   * Install all the packages of this page in CHROOT # marked done
-  Switch eth1 back to 10.10 and do NFS mounts+ 
 + 
 +To do another node, the steps are NOT WORKING! 
 +Trying n36 with cuda rpm (local) 
 + 
 +  * add node in deploy.txt of n37.chroot/ 
 +  * ./deploy.txt `grep node_name deploy.txt` 
 +  * scp in place passwd, shadow, group, hosts, fstab from global archive 
 +  * umount -a 
 +  * ONBOOT=no, ib0 ??? connectX mlx4_0 IB interface breaks in CentOS 7.3+ 
 +  * bootlocal=EXIT then reboot then check polkit user … screws up systemd-logind 
 + 
 +  * hostnamectl set-hostname node_name (logout/login) 
 +  * eth1 on 129.133 
 +  * rpm -i kernel-devel 
 +  * rpm -i /usr/local/src/cuda-repo-rhel10-0-local-10.0.130-410.48-1.0-1.x86_64.rpm 
 +  * Nvidia install: files in /usr/local/src 
 +    * sh cuda_name_of_runfile 
 +    * nvidia-modprobe.sh 
 + 
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/172.txt · Last modified: 2020/07/15 17:52 by hmeij07