This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:172 [2018/08/23 18:49] hmeij07 |
cluster:172 [2018/09/25 14:34] hmeij07 [Finish] |
||
---|---|---|---|
Line 37: | Line 37: | ||
# download runfiles from https:// | # download runfiles from https:// | ||
# files in / | # files in / | ||
- | sh cuda_name_of_runfile | + | sh cuda_9.2.148_396.37_linux.run |
- | sh cuda_name_of_runfile_patch | + | |
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26? | Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26? | ||
Line 55: | Line 55: | ||
(y)es/ | (y)es/ | ||
- | # / | + | # / |
# reboot before driver installation # CHROOT done | # reboot before driver installation # CHROOT done | ||
blacklist nouveau | blacklist nouveau | ||
Line 62: | Line 62: | ||
# nvidia driver | # nvidia driver | ||
- | ./ | + | ./ |
# backup | # backup | ||
Line 81: | Line 81: | ||
# new kernel initramfs, load | # new kernel initramfs, load | ||
dracut --force | dracut --force | ||
+ | |||
+ | # for mapd graphics support needs to be enabled | ||
+ | nvidia-smi --gom=0 | ||
+ | # have left persistence and exclusivity at defaults for now | ||
+ | |||
reboot | reboot | ||
Line 312: | Line 317: | ||
* to stay with openmpi-1.8.4 (not mpich3...) | * to stay with openmpi-1.8.4 (not mpich3...) | ||
- | * consulting the ARCH web page I choose -arch=sm_35 | + | * consulting the ARCH web page I choose -arch=sm_35 |
- | Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals. | + | Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals. |
< | < | ||
Line 320: | Line 325: | ||
[hmeij@n37 src]$ ll / | [hmeij@n37 src]$ ll / | ||
total 104356 | total 104356 | ||
- | -rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-cuda | + | -rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-gpu |
- | -rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-cuda | + | -rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-gpu |
- | -rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-cuda | + | -rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-gpu |
</ | </ | ||
Line 362: | Line 367: | ||
* Make the final tar file for /usr/local and post with CHROOT # done | * Make the final tar file for /usr/local and post with CHROOT # done | ||
* Install all the packages of this page in CHROOT # marked done | * Install all the packages of this page in CHROOT # marked done | ||
+ | |||
+ | |||
+ | To do another node, the steps are NOT WORKING! | ||
+ | Trying n36 with cuda rpm (local) | ||
+ | |||
+ | * add node in deploy.txt of n37.chroot/ | ||
+ | * ./ | ||
+ | * scp in place passwd, shadow, group, hosts, fstab from global archive | ||
+ | * umount -a | ||
+ | * ONBOOT=no, ib0 ??? connectX mlx4_0 IB interface breaks in CentOS 7.3+ | ||
+ | * bootlocal=EXIT then reboot then check polkit user … screws up systemd-logind | ||
+ | |||
+ | * hostnamectl set-hostname node_name (logout/ | ||
+ | * tar in place n37.chroot.ul.tar.gz in / FIRST | ||
+ | * REMOVE / | ||
+ | * Nvidia install: files in / | ||
+ | * sh cuda_name_of_runfile | ||
+ | * nvidia-modprobe.sh | ||
+ | |||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |