User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
cluster:192 [2020/02/06 15:37]
hmeij07 created
cluster:192 [2020/02/21 17:18]
hmeij07 [EXX96]
Line 6: Line 6:
 A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which as the test node in redoing our K20 nodes, see [[cluster:172|K20 Redo]] A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which as the test node in redoing our K20 nodes, see [[cluster:172|K20 Redo]]
-==== WhatWeDo? ====+Page best read bottom to top.
-Steps.+==== Miscellaneous ====
-==== WhatWeGot? ====+<code> 
 +# add to script, get and set date 
 +NOW=`/bin/date +%m%d%H%M%Y.%S` 
 +for i in `seq 79 90`; do echo n$i; ssh n$i date $NOW; done 
 +# crontab 
 +# ionice gaussian 
 +0,15,30,45 * * * * /share/apps/scripts/  > /dev/null 2>&
 +# cpu temps 
 +40 * * * * /share/apps/scripts/ > /dev/null 2>&
 +# rc.local, chmod o+x /etc/rc.d/rc.local, then add 
 +# for mapd, 'All On' enable graphicsrendering support 
 +#/usr/bin/nvidia-smi --gom=
 +# for amber16 -pm=ENABLED -c=EXCLUSIVE_PROCESS 
 +#nvidia-smi --persistence-mode=1 
 +#nvidia-smi --compute-mode=1 
 +# for mwgpu/exx96 -pm=ENABLED -c=DEFAULT 
 +nvidia-smi --persistence-mode=1 
 +nvidia-smi --compute-mode=0 
 +# turn ECC off (memory scrubbing) 
 +#/usr/bin/nvidia-smi -e 0 
 +# lm_sensor 
 +modprobe coretemp 
 +modprobe tmp401 
 +#modprobe w83627ehf 
 +==== Recipe ==== 
 +Steps. "Ala n37" ... so the RTX nodes are similar to the K20 nodes and we can put the local software in place. See [[cluster:172|K20 Redo]] page.  First we add these packages and clean up. 
 +# hook up VDI-D cable to GPU port (offboard video) 
 +# login as root check some things out... 
 +free -g 
 +docker images 
 +docker ps 
 +# set local time zone 
 +mv /etc/localtime /etc/localtime.backup 
 +ln -s /usr/share/zoneinfo/America/New_York /etc/localtime 
 +# change passwords for root and vendor account 
 +passwd exx 
 +# set hostname 
 +hostnamectl set-hostname n79 
 +# configure private subnets and ping file server 
 +cd /etc/sysconfig/network-scripts/ 
 +vi ifcfg-enp1s0f0 
 +vi ifcfg-enp1s0f1 
 +systemctl restart network 
 +ping -c 3 
 +ping -c 3 
 +# make internet connection for yum 
 +ifdown enp1s0f0 
 +vi ifcfg-enp1s0f0 
 +systemctl restart network 
 +yum install -y iptables-services 
 +vi /etc/sysconfig/iptables 
 +systemctl start iptables 
 +iptables -L 
 +systemctl stop firewalld 
 +systemctl disable firewalld 
 +# other configs 
 +vi /etc/selinux/config (disabled) 
 +mv /home /usr/local/ 
 +mkdir /home 
 +vi /etc/passwd (exx, dockeruser $HOME) 
 +mkdir /sanscratch /localscratch 
 +chmod ugo+rwx /sanscratch /localscratch 
 +chmod o+t /sanscratch /localscratch 
 +ln -s /home /share 
 +ssh-keygen -t rsa 
 +scp /root/.ssh/ 
 +/etc/ssh/sshd_config (PermitRootLogin) 
 +echo "relayhost =" >> /etc/postfix/ 
 +# add packages and update 
 +yum install epel-release -y 
 +yum install tcl tcl-devel dmtcp -y 
 +yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y 
 +yum install blas blas-devel lapack lapack-devel boost boost-devel -y 
 +yum install tkinter lm_sensors lm_sensors-libs -y 
 +yum install zlib-devel bzip2-devel bzip bzip-devel -y 
 +yum install openmpi openmpi-devel perl-ExtUtils-MakeMaker -y 
 +yum install cmake cmake-devel -y 
 +yum install libjpeg libjpeg-devel libjpeg-turbo-devel -y 
 +yum update -y 
 +yum clean all 
 +# remove internet, bring private back up 
 +ifdown enp1s0f0 
 +vi ifcfg-enp1s0f0 
 +ifup enp1s0f0 
 +# passwd, shadow, group, hosts, fstab 
 +mkdir /homeextra1 /homeextra2 /home33 /mindstore 
 +cd /etc/ 
 +# backup files to -orig versions 
 +scp /etc/passwd (and others) 
 +scp /tmp 
 +vi /etc/fstab 
 +mount -a; df -h 
 +# pick the kernel vendor used for now 
 +grep ^menuentry /etc/grub2.cfg 
 +grub2-set-default 1 
 +ls -d /sys/firmware/efi && echo "EFI" || echo "Legacy" 
 +grub2-mkconfig -o /boot/grub2/grub.cfg 
 +#grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg 
 +# old level 3 
 +systemctl set-default 
 +# switch to VGA 
 +cd /usr/local/src/ 
 +tar zxf n37.chroot-keep.ul.tar.gz 
 +cd usr/local/ 
 +mv amber16/  fsl-5.0.10/ gromacs-2018/ lammps-22Aug18/ /usr/local/ 
 +mv cuda-9.2/ /usr/local/n37-cuda-9.2/ 
 +cd /usr/local/bin/ 
 +rsync -vac /usr/local/bin/ 
 +# test scripts gpu-free, gpu-info, gpu-process 
 +0, GeForce RTX 2080 SUPER, 25, 126 MiB, 7855 MiB, 0 %, 0 % 
 +1, GeForce RTX 2080 SUPER, 24, 11 MiB, 7971 MiB, 0 %, 0 % 
 +2, GeForce RTX 2080 SUPER, 23, 11 MiB, 7971 MiB, 0 %, 0 % 
 +3, GeForce RTX 2080 SUPER, 23, 11 MiB, 7971 MiB, 0 %, 0 % 
 +gpu_name, gpu_bus_id, pid, process_name 
 +GeForce RTX 2080 SUPER, 00000000:3B:00.0, 3109, python 
 +# done 
 +==== What We Purchased ====
   * 12 nodes yielding a total of   * 12 nodes yielding a total of
-    * +24 cpus  +    * 24 cpus  
-    * +288 cpu cores  +    * 288 cpu cores  
-    * +1,152 gb cpu mem +    * 1,152 gb cpu mem 
-    * +48 gpus  +    * ~20 Tflops (dpfp) 
-    * +384 gpu mem +    * 48 gpus  
-  these rtx gpus will add 695 Tflops of "mixed mode" computational capacity. +    * 384 gpu mem 
-    * blows me away+    ~700 Tflops (mixed mode)
 <code> <code>
Line 106: Line 252:
 {{:cluster:back_small.JPG?nolink&300|}} Back, gpus stacked 2 on 2 \\ {{:cluster:back_small.JPG?nolink&300|}} Back, gpus stacked 2 on 2 \\
 {{:cluster:front_small.JPG?nolink&300|}} Front, all drive bays empty \\ {{:cluster:front_small.JPG?nolink&300|}} Front, all drive bays empty \\
 +{{:cluster:rack_small.JPG?nolink&300|}} Racking \\
 +{{:cluster:boxes_small.JPG?nolink&300|}} Boxes \\
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/192.txt · Last modified: 2022/03/08 18:29 by hmeij07