This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:172 [2018/08/22 11:58] hmeij07 [Nvidia] |
cluster:172 [2018/09/26 15:21] hmeij07 [mapd] |
||
---|---|---|---|
Line 17: | Line 17: | ||
* copy passwd, shadow, group, hosts, fstab from global archive | * copy passwd, shadow, group, hosts, fstab from global archive | ||
* check polkit user ... screws up systemd-logind | * check polkit user ... screws up systemd-logind | ||
- | * connextX | + | * connectX |
+ | * unmount NFS mounts while installing nvidia as root | ||
+ | * install other software as regular user | ||
==== Nvidia ==== | ==== Nvidia ==== | ||
+ | |||
+ | ** Installation ** | ||
< | < | ||
Line 29: | Line 33: | ||
yum update kernel kernel-tools kernel-tools-libs | yum update kernel kernel-tools kernel-tools-libs | ||
yum install kernel-devel kernel-headers (remove old headers after reboot) | yum install kernel-devel kernel-headers (remove old headers after reboot) | ||
- | yum install gcc gcc-devel | + | yum install gcc gcc-gfortran gcc-c++ # CHROOT done |
+ | |||
+ | # / | ||
+ | # reboot before driver installation # CHROOT done | ||
+ | blacklist nouveau | ||
+ | options nouveau modeset=0 | ||
+ | |||
+ | # new kernel initramfs, load | ||
+ | dracut --force | ||
+ | |||
+ | reboot | ||
# download runfiles from https:// | # download runfiles from https:// | ||
- | sh cuda_name_of_runfile | + | # files in / |
- | sh cuda_name_of_runfile_patch | + | sh cuda_9.2.148_396.37_linux.run |
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26? | Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26? | ||
Line 49: | Line 65: | ||
Install the CUDA 9.2 Samples? | Install the CUDA 9.2 Samples? | ||
(y)es/ | (y)es/ | ||
- | |||
- | #/ | ||
- | blacklist nouveau | ||
- | options nouveau modeset=0 | ||
- | reboot | ||
# nvidia driver | # nvidia driver | ||
./ | ./ | ||
+ | |||
+ | # Device files/ | ||
+ | # They were not | ||
+ | / | ||
# backup | # backup | ||
[root@n37 src]# rpm -qf / | [root@n37 src]# rpm -qf / | ||
file / | file / | ||
- | cp / | + | cp / |
+ | cp / | ||
[root@n37 src]# ls / | [root@n37 src]# ls / | ||
Line 67: | Line 83: | ||
[root@n37 src]# find / | [root@n37 src]# find / | ||
[root@n37 src]# | [root@n37 src]# | ||
- | [root@n37 src]# scp n78:/ | + | [root@n37 src]# scp n78:/ |
- | # Device files/ | + | # for mapd graphics support needs to be enabled |
- | # They were not | + | nvidia-smi --gom=0 |
- | / | + | # have left persistence and exclusivity at defaults for now |
- | # new kernel initramfs, load | ||
- | dracut --force | ||
reboot | reboot | ||
Line 82: | Line 96: | ||
* export PATH=/ | * export PATH=/ | ||
* export LD_LIBRARY_PATH=/ | * export LD_LIBRARY_PATH=/ | ||
+ | * export CUDA_HOME=/ | ||
+ | |||
+ | **Verification** | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@n37 cuda-9.2]# / | ||
+ | / | ||
+ | |||
+ | CUDA Device Query (Runtime API) version (CUDART static linking) | ||
+ | |||
+ | Detected 4 CUDA Capable device(s) | ||
+ | |||
+ | Device 0: "Tesla K20m" | ||
+ | CUDA Driver Version / Runtime Version | ||
+ | CUDA Capability Major/Minor version number: | ||
+ | ... | ||
+ | > Peer access from Tesla K20m (GPU0) -> Tesla K20m (GPU1) : Yes | ||
+ | > Peer access from Tesla K20m (GPU0) -> Tesla K20m (GPU2) : No | ||
+ | > Peer access from Tesla K20m (GPU0) -> Tesla K20m (GPU3) : No | ||
+ | > Peer access from Tesla K20m (GPU1) -> Tesla K20m (GPU0) : Yes | ||
+ | > Peer access from Tesla K20m (GPU1) -> Tesla K20m (GPU2) : No | ||
+ | > Peer access from Tesla K20m (GPU1) -> Tesla K20m (GPU3) : No | ||
+ | > Peer access from Tesla K20m (GPU2) -> Tesla K20m (GPU0) : No | ||
+ | > Peer access from Tesla K20m (GPU2) -> Tesla K20m (GPU1) : No | ||
+ | > Peer access from Tesla K20m (GPU2) -> Tesla K20m (GPU3) : Yes | ||
+ | > Peer access from Tesla K20m (GPU3) -> Tesla K20m (GPU0) : No | ||
+ | > Peer access from Tesla K20m (GPU3) -> Tesla K20m (GPU1) : No | ||
+ | > Peer access from Tesla K20m (GPU3) -> Tesla K20m (GPU2) : Yes | ||
+ | |||
+ | deviceQuery, | ||
+ | CUDA Runtime Version = 9.2, NumDevs = 4, | ||
+ | Device0 = Tesla K20m, Device1 = Tesla K20m, | ||
+ | Device2 = Tesla K20m, Device3 = Tesla K20m | ||
+ | Result = PASS | ||
+ | |||
+ | </ | ||
+ | |||
+ | ** BandWithTest ** | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@n37 cuda-9.2]# / | ||
+ | [CUDA Bandwidth Test] - Starting... | ||
+ | Running on... | ||
+ | |||
+ | | ||
+ | Quick Mode | ||
+ | |||
+ | Host to Device Bandwidth, 1 Device(s) | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | Result = PASS | ||
+ | |||
+ | </ | ||
+ | |||
+ | ** Finish ** | ||
+ | |||
+ | * yum install freeglut-devel libX11-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel # CHROOT done | ||
+ | * yum install blas blas-devel lapack lapck-devel #CHROOT done | ||
+ | * check for / | ||
+ | |||
+ | * [root@n37 /]# tar -cvf / | ||
+ | * [root@n37 /]# scp / | ||
+ | |||
+ | ==== Amber ==== | ||
+ | |||
+ | ** Requirements ** | ||
+ | |||
+ | < | ||
+ | # As root check requirements # CHROOT done | ||
+ | rpm -qa | grep ^flex | ||
+ | rpm -qa | grep ^tcsh | ||
+ | rpm -qa | grep ^zlib | ||
+ | rpm -qa | grep ^zlib-devel | ||
+ | rpm -qa | grep ^bzip2 | ||
+ | rpm -qa | grep ^bzip2-devel | ||
+ | rpm -qa | grep ^bzip | ||
+ | rpm -qa | grep ^bzip-devel | ||
+ | rpm -qa | grep ^libXt | ||
+ | rpm -qa | grep ^libXext | ||
+ | rpm -qa | grep ^libXdmcp | ||
+ | rpm -qa | grep ^tkinter # weird one need python 2.6.6_something | ||
+ | rpm -qa | grep ^openmpi | ||
+ | rpm -qa | grep ^perl | egrep " | ||
+ | rpm -qa | grep ^patch | ||
+ | rpm -qa | grep ^bison | ||
+ | |||
+ | # As root install missing # CHROOT done | ||
+ | # CHROOT done | ||
+ | |||
+ | </ | ||
+ | |||
+ | ** Compilations ** | ||
+ | |||
+ | < | ||
+ | |||
+ | # as regular user | ||
+ | # amber16 dir will be created | ||
+ | cd /usr/local | ||
+ | tar xvfj / | ||
+ | tar xvfj / | ||
+ | export AMBERHOME=/ | ||
+ | cd $AMBERHOME | ||
+ | |||
+ | # to preserve existing work flows | ||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | export PATH=/ | ||
+ | |||
+ | # use gnu, Y to patches, Y to miniconda | ||
+ | # bundled netcdf, fftw | ||
+ | ./configure gnu | tee -a amber16-install.log 2>&1 | ||
+ | source / | ||
+ | make install | tee -a amber16-install.log 2>&1 | ||
+ | Installation of Amber16 (serial) is complete at Wed Aug 22 10:12:55 EDT 2018. | ||
+ | |||
+ | ./configure -mpi gnu | tee -a amber16-install.log 2>&1 | ||
+ | source / | ||
+ | make install | tee -a amber16-install.log 2>&1 | ||
+ | Installation of Amber16 (parallel) is complete at Wed Aug 22 10:36:45 EDT 2018. | ||
+ | |||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | # $AMBERHOME/ | ||
+ | # edit and bypass cuda test for 9.0 -> 9.2 version | ||
+ | # please be sure to verify any results against known outcomes | ||
+ | export CUDA_HOME=/ | ||
+ | |||
+ | ./configure -cuda gnu | tee -a amber16-install.log 2>& | ||
+ | source / | ||
+ | make install | tee -a amber16-install.log 2>&1 | ||
+ | Installation of pmemd.cuda complete | ||
+ | |||
+ | | ||
+ | source / | ||
+ | make install | tee -a amber16-install.log 2>&1 | ||
+ | Installation of pmemd.cuda.MPI complete | ||
+ | |||
+ | [hmeij@n37 amber16]$ ls -l bin/pmemd* | ||
+ | -rwxr-xr-x 1 hmeij its 3097968 Aug 22 10:12 bin/pmemd | ||
+ | lrwxrwxrwx 1 hmeij its 15 Aug 22 15:19 bin/ | ||
+ | pmemd.cuda_SPFP | ||
+ | -rwxr-xr-x 1 hmeij its 38851928 Aug 22 15:25 bin/ | ||
+ | -rwxr-xr-x 1 hmeij its 39436704 Aug 22 16:04 bin/ | ||
+ | lrwxrwxrwx 1 hmeij its 19 Aug 22 15:57 bin/ | ||
+ | pmemd.cuda_SPFP.MPI | ||
+ | -rwxr-xr-x 1 hmeij its 32950848 Aug 22 15:19 bin/ | ||
+ | -rwxr-xr-x 1 hmeij its 33531456 Aug 22 15:57 bin/ | ||
+ | -rwxr-xr-x 1 hmeij its 33405504 Aug 22 15:31 bin/ | ||
+ | -rwxr-xr-x 1 hmeij its 33990208 Aug 22 16:10 bin/ | ||
+ | -rwxr-xr-x 1 hmeij its 3647784 Aug 22 10:36 bin/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | **Tests** | ||
+ | |||
+ | Although the 9.2 cuda compiled Amber passed all tests please double check your results. | ||
+ | |||
+ | < | ||
+ | export DO_PARALLEL=" | ||
+ | make test >> amber16-test.log 2>&1 | ||
+ | </ | ||
+ | |||
+ | **Finish** | ||
+ | * [root@n37 /]# tar -cvf / | ||
+ | * [root@n37 /]# scp / | ||
+ | |||
+ | |||
+ | ==== Gromacs ==== | ||
+ | |||
+ | As root install | ||
+ | |||
+ | * cmake, latest version, never understand why so far ahead of distro... | ||
+ | |||
+ | Download and extract source. Using same environment as Amber compilation. | ||
+ | |||
+ | < | ||
+ | |||
+ | cd gromacs-2018/ | ||
+ | mkdir build | ||
+ | cd build | ||
+ | |||
+ | which mpicc mpicxx | ||
+ | / | ||
+ | / | ||
+ | |||
+ | | ||
+ | / | ||
+ | -DCMAKE_INSTALL_PREFIX=/ | ||
+ | -DGMX_BUILD_OWN_FFTW=ON -DGMX_MPI=ON -DGMX_GPU=ON | ||
+ | | ||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Lammps ==== | ||
+ | |||
+ | As root install | ||
+ | |||
+ | * yum install libjpeg libjpeg-devel libjpeg-turbo libjpeg-turbo-devel # CHROOT done | ||
+ | * yum install blas blas-devel lapack lapack-devel boost boost-devel # CHROOT done | ||
+ | |||
+ | For Lammps-22Aug18 I followed the top installation instructions at this page | ||
+ | |||
+ | * [[cluster: | ||
+ | |||
+ | The only difference in my approach was | ||
+ | |||
+ | * to stay with openmpi-1.8.4 (not mpich3...) | ||
+ | * consulting the ARCH web page I choose -arch=sm_35 (on n37 for K20) | ||
+ | |||
+ | Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals. | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@n37 src]$ ll / | ||
+ | total 104356 | ||
+ | -rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-gpu | ||
+ | -rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-gpu | ||
+ | -rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-gpu | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== mapd ==== | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | < | ||
+ | |||
+ | useradd -U mapd | ||
+ | |||
+ | # mapd.repo | ||
+ | [mapd-ce-cuda] | ||
+ | name=mapd ce - cuda | ||
+ | baseurl=https:// | ||
+ | gpgcheck=1 | ||
+ | gpgkey=https:// | ||
+ | |||
+ | yum install \ | ||
+ | copy-jdk-configs java-1.8.0-openjdk-headless \ | ||
+ | javapackages-tools libxslt \ | ||
+ | lksctp-tools python-javapackages \ | ||
+ | python-lxml tzdata-java | ||
+ | # CHROOT done | ||
+ | |||
+ | yum install mapd # n37:/ | ||
+ | |||
+ | # User specific aliases and functions | ||
+ | export MAPD_USER=mapd | ||
+ | export MAPD_GROUP=mapd | ||
+ | export MAPD_STORAGE=/ | ||
+ | export MAPD_PATH=/ | ||
+ | # The $MAPD_STORAGE directory must be dedicated to MapD | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Finish ==== | ||
+ | |||
+ | * Make the final tar file for /usr/local and post with CHROOT # done | ||
+ | * Install all the packages of this page in CHROOT # marked done | ||
+ | |||
+ | |||
+ | To do another node, the steps are | ||
+ | |||
+ | * add node in deploy.txt of n36.chroot/ | ||
+ | * ./ | ||
+ | * scp in place passwd, shadow, group, hosts, fstab from global archive | ||
+ | * umount -a | ||
+ | * ONBOOT=no, ib0 ??? connectX mlx4_0 IB interface breaks in CentOS 7.3+ | ||
+ | * bootlocal=EXIT then reboot then check polkit user … screws up systemd-logind | ||
+ | |||
+ | * hostnamectl set-hostname node_name (logout/ | ||
+ | * eth1 on 129.133 | ||
+ | * yum update | ||
+ | * yum install kernel-headers kernel-devel | ||
+ | * put n37 tarball in /, unpack | ||
+ | * remove cuda-9.2 | ||
+ | |||
+ | * Nvidia install: files in / | ||
+ | * remove nouveau | ||
+ | * sh runfile | ||
+ | * ./runfile -silent -driver | ||
+ | * install all CHROOT done packages | ||
+ | * reboot | ||
+ | |||
+ | |||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |