User Tools

Site Tools


cluster:172

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:172 [2018/08/23 18:17]
hmeij07 [mapd]
cluster:172 [2020/07/15 17:52]
hmeij07
Line 20: Line 20:
   * unmount NFS mounts while installing nvidia as root   * unmount NFS mounts while installing nvidia as root
   * install other software as regular user    * install other software as regular user 
-  *  
 ==== Nvidia ==== ==== Nvidia ====
  
Line 34: Line 33:
 yum update kernel kernel-tools kernel-tools-libs yum update kernel kernel-tools kernel-tools-libs
 yum install kernel-devel kernel-headers (remove old headers after reboot) yum install kernel-devel kernel-headers (remove old headers after reboot)
-yum install gcc gcc-devel gcc-gfortran gcc-c+++yum install gcc gcc-gfortran gcc-c++  # CHROOT done 
 +yum install tcl tcl-devel # CHROOT done 
 + 
 +# /etc/modprobe.d/blacklist-nouveau.conf (new file by nvidia) 
 +# reboot before driver installation # CHROOT done 
 +blacklist nouveau 
 +options nouveau modeset=0 
 + 
 +# new kernel initramfs, load 
 +dracut --force 
 + 
 +reboot 
  
 # download runfiles from https://developer.nvidia.com/cuda-downloads # download runfiles from https://developer.nvidia.com/cuda-downloads
-sh cuda_name_of_runfile +# files in /usr/local/src 
-sh cuda_name_of_runfile_patch+sh cuda_9.2.148_396.37_linux.run 
  
 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26? Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.26?
Line 54: Line 66:
 Install the CUDA 9.2 Samples? Install the CUDA 9.2 Samples?
 (y)es/(n)o/(q)uit: n (y)es/(n)o/(q)uit: n
- 
-#/etc/modprobe.d/blacklist-nouveau.conf, reboot before driver instllation 
-blacklist nouveau 
-options nouveau modeset=0 
-reboot 
  
 # nvidia driver # nvidia driver
 ./cuda_name_of_runfile -silent -driver ./cuda_name_of_runfile -silent -driver
 +
 +# Device files/dev/nvidia* exist with 0666 permissions?
 +# They were not 
 +/usr/local/src/nvidia-modprobe.sh
  
 # backup # backup
 [root@n37 src]# rpm -qf /usr/lib/libGL.so [root@n37 src]# rpm -qf /usr/lib/libGL.so
 file /usr/lib/libGL.so is not owned by any package file /usr/lib/libGL.so is not owned by any package
-cp /usr/lib/libGL.so /usr/lib/libGL.so-nvidia+cp /usr/lib/libGL.so.1.7.0   /usr/lib/libGL.so.1.7.0-nvidia 
 +cp /usr/lib64/libGl.so.1.7.0 /usr/lib64/libGL.so.1.7.0-nvidia
  
 [root@n37 src]# ls /etc/X11/xorg.conf [root@n37 src]# ls /etc/X11/xorg.conf
Line 72: Line 84:
 [root@n37 src]# find /usr/local/cuda-9.2 -name nvidia-xconfig* [root@n37 src]# find /usr/local/cuda-9.2 -name nvidia-xconfig*
 [root@n37 src]# [root@n37 src]#
-[root@n37 src]# scp n78:/etc/X11/xorg.conf /etc/X11/+[root@n37 src]# scp n78:/etc/X11/xorg.conf /etc/X11/  # CHROOT done
  
-Device files/dev/nvidia* exist with 0666 permissions? +for mapd graphics support needs to be enabled 
-# They were not  +nvidia-smi --gom=0 
-/usr/local/src/nvidia-modprobe.sh+# have left persistence and exclusivity at defaults for now
  
-# new kernel initramfs, load 
-dracut --force 
 reboot reboot
  
Line 157: Line 167:
 ** Finish ** ** Finish **
  
-  * yum install freeglut-devel libX11-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel+  * yum install freeglut-devel libX11-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel # CHROOT done 
 +  * yum install blas blas-devel lapack lapack-devel #CHROOT done
   * check for /usr/lib64/libvdpau_nvidia.so   * check for /usr/lib64/libvdpau_nvidia.so
 +
   * [root@n37 /]# tar -cvf /tmp/n37.chroot.ul.tar usr/local   * [root@n37 /]# tar -cvf /tmp/n37.chroot.ul.tar usr/local
   * [root@n37 /]# scp /tmp/n37.chroot.ul.tar sms_server:/var/chroots/goldimages/   * [root@n37 /]# scp /tmp/n37.chroot.ul.tar sms_server:/var/chroots/goldimages/
Line 167: Line 179:
  
 <code> <code>
-As root check requirements +centos7 
-rpm -qa | grep ^gcc +yum -y install tcsh make \ 
-rpm -qa | grep ^g+++               gcc gcc-gfortran gcc-c++ 
 +               which flex bison patch bc \ 
 +               libXt-devel libXext-devel \ 
 +               perl perl-ExtUtils-MakeMaker util-linux wget \ 
 +               bzip2 bzip2-devel zlib-devel tar  
 +</code>                
 + 
 +<code> 
 +# As root check requirements # CHROOT done
 rpm -qa | grep ^flex rpm -qa | grep ^flex
 rpm -qa | grep ^tcsh rpm -qa | grep ^tcsh
Line 187: Line 207:
 rpm -qa | grep ^bison rpm -qa | grep ^bison
  
-# As root install missing +# As root install missing # CHROOT done 
-yum install flex bzip2-devel libXdmcp zlib zlib-devel +# CHROOT done
-yum install tkinter openmpi perl-ExtUtils-MakeMaker patch bison+
  
 </code> </code>
Line 296: Line 315:
  
 </code> </code>
 +
 ==== Lammps ==== ==== Lammps ====
  
 As root install As root install
  
-  * yum install libjpeg libjpeg-devel libjpeg-turbo libjpeg-turbo-devel  +  * yum install libjpeg libjpeg-devel libjpeg-turbo libjpeg-turbo-devel # CHROOT done 
-  * yum install blas blas-devel lapack lapack-devel boost boost-devel+  * yum install blas blas-devel lapack lapack-devel boost boost-devel # CHROOT done
  
 For Lammps-22Aug18 I followed the top installation instructions at this page For Lammps-22Aug18 I followed the top installation instructions at this page
Line 310: Line 330:
  
   * to stay with openmpi-1.8.4 (not mpich3...)   * to stay with openmpi-1.8.4 (not mpich3...)
-  * consulting the ARCH web page I choose -arch=sm_35+  * consulting the ARCH web page I choose -arch=sm_35 (on n37 for K20)
  
-Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals.  Please double check results against previous runs. Compile as regular user and stage lmp_mpi in /usr/local/lammps-22Aug10/+Good thing we're doing this now, future versions of CUDA will not support the K20s anymore. In fact on that web site they are not mentioned, only the K40/K80 gpus. So we'll see what testing reveals.  Please double check results against previous runs. Compile as regular user and stage lmp_mpi in /usr/local/lammps-22Aug18/
  
 <code> <code>
Line 318: Line 338:
 [hmeij@n37 src]$ ll /usr/local/lammps-22Aug18/ [hmeij@n37 src]$ ll /usr/local/lammps-22Aug18/
 total 104356 total 104356
--rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-cuda +-rwxr-xr-x 1 hmeij its 35739800 Aug 23 08:49 lmp_mpi-double-double-with-gpu 
--rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-cuda +-rwxr-xr-x 1 hmeij its 35555672 Aug 23 09:11 lmp_mpi-single-double-with-gpu 
--rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-cuda+-rwxr-xr-x 1 hmeij its 35559552 Aug 23 09:53 lmp_mpi-single-single-with-gpu
  
 </code> </code>
Line 343: Line 363:
   javapackages-tools libxslt \   javapackages-tools libxslt \
   lksctp-tools python-javapackages \   lksctp-tools python-javapackages \
-  python-lxml tzdata-java +  python-lxml tzdata-java  nfs-utils psmisc lm_sensors 
-  mapd  +  # CHROOT done 
-  # n37:/usr/local/src+ 
 +yum install mapd   # n37:/usr/local/src
  
 # User specific aliases and functions # User specific aliases and functions
Line 358: Line 379:
 ==== Finish ==== ==== Finish ====
  
-  * Make the final tar file for /usr/local and post with CHROOT +  * Make the final tar file for /usr/local and post with CHROOT # done 
-  * Install all the packages of this page in CHROOT+  * Install all the packages of this page in CHROOT # marked done 
 + 
 + 
 +To do another node, the steps are 
 + 
 +  * add node in deploy.txt of n36.chroot/  (centos 7.2) 
 +  * ./deploy.txt `grep node_name deploy.txt` 
 +  * umount -a 
 +  * ONBOOT=no, ib0 ??? connectX mlx4_0 IB interface breaks in CentOS 7.3+ 
 +  * bootlocal=EXIT then reboot then check polkit user … screws up systemd-logind 
 + 
 +  * hostnamectl set-hostname node_name (logout/login) 
 +  * eth1 on 129.133 
 +  * yum update 
 +  * yum install kernel-headers kernel-devel epel-release 
 +  * put n37 tarball in /, unpack 
 +  * remove cuda-9.2 
 + 
 +  * Nvidia install: files in /usr/local/src 
 +    * remove nouveau 
 +    * disable selinux, NetworkManager, firewalld 
 +    * reboot 
 +    * sh runfile 
 +    * ./runfile -silent -driver 
 +    * install all CHROOT done packages 
 +    * yum clean all 
 +    * reboot 
 + 
 +  * custom fstab 
 +  * mount on 10.10 
 +  * authorized_keys 
 +  * scp in place from global archive...make backups 
 +  * passwd, shadow, group, hosts  
 +  * reboot for polkit, check /etc/ssh/ssh_host* perms/owners 
 + 
 +  * /share/apps/src/openlava3 install in centOS7 
 +  * systemctl enable 
 +  * eth1 on 10.10, mounts ok? 
 +  * /etc/default/grub add "nomodeset" and GRUB_RECORDFAIL_TIMEOUT (grub2-mkconfig -o /boot/grub2/grub.cfg) 
 +    * did not help the count down 
 +    * did fix the text console 
 +  * rc.local, crontab 
 +  * reboot 
 + 
 +Finished rebuilding n33-n37 based on n37 example. 
 + --- //[[hmeij@wesleyan.edu|Henk]] 2018/10/11 10:04// 
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/172.txt · Last modified: 2020/07/15 17:52 by hmeij07