This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:192 [2020/02/07 13:22] hmeij07 [WhatWeGot?] |
cluster:192 [2020/02/21 15:50] hmeij07 |
||
---|---|---|---|
Line 6: | Line 6: | ||
A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which as the test node in redoing our K20 nodes, see [[cluster: | A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which as the test node in redoing our K20 nodes, see [[cluster: | ||
- | ==== WhatWeDo? | + | Page best read bottom to top. |
+ | |||
+ | ==== Recipe | ||
Steps. "Ala n37" ... so the RTX nodes are similar to the K20 nodes and we can put the local software in place. See [[cluster: | Steps. "Ala n37" ... so the RTX nodes are similar to the K20 nodes and we can put the local software in place. See [[cluster: | ||
Line 12: | Line 14: | ||
< | < | ||
- | yum install epel-release | + | # hook up VDI-D cable to GPU port (offboard video) |
- | yum install tcl tcl-devel dmtcp | + | # login as root check some things out... |
- | yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel | + | free -g |
- | yum install blas blas-devel lapack lapack-devel boost boost-devel | + | nvidia-smi |
- | yum install tkinter lm_sensors lm_sensors-libs | + | docker images |
- | yum install zlib-devel bzip2-devel bzip bzip-devel | + | docker ps |
- | yum install openmpi openmpi-devel perl-ExtUtils-MakeMaker | + | # set local time zone |
- | yum install cmake cmake-devel | + | mv / |
- | yum install libjpeg libjpeg-devel libjpeg-turbo-devel | + | ln -s / |
+ | # change passwords for root and vendor account | ||
+ | passwd | ||
+ | passwd exx | ||
+ | # set hostname | ||
+ | hostnamectl set-hostname n79 | ||
+ | # configure private subnets and ping file server | ||
+ | cd / | ||
+ | vi ifcfg-enp1s0f0 | ||
+ | vi ifcfg-enp1s0f1 | ||
+ | systemctl restart network | ||
+ | ping -c 3 192.168.102.42 | ||
+ | ping -c 3 10.10.102.42 | ||
+ | # make internet connection for yum | ||
+ | ifdown enp1s0f0 | ||
+ | vi ifcfg-enp1s0f0 | ||
+ | systemctl restart network | ||
+ | dig google.com | ||
+ | yum install -y iptables-services | ||
+ | vi / | ||
+ | systemctl start iptables | ||
+ | iptables -L | ||
+ | systemctl stop firewalld | ||
+ | systemctl disable firewalld | ||
+ | # other configs | ||
+ | vi / | ||
+ | mv /home / | ||
+ | mkdir /home | ||
+ | vi /etc/passwd (exx, dockeruser $HOME) | ||
+ | mkdir /sanscratch / | ||
+ | chmod ugo+rwx /sanscratch / | ||
+ | chmod o+t /sanscratch / | ||
+ | ln -s /home /share | ||
+ | ssh-keygen -t rsa | ||
+ | scp 10.10.102.253:/ | ||
+ | / | ||
+ | echo " | ||
+ | # add packages and update | ||
+ | yum install epel-release | ||
+ | yum install tcl tcl-devel dmtcp -y | ||
+ | yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel | ||
+ | yum install blas blas-devel lapack lapack-devel boost boost-devel | ||
+ | yum install tkinter lm_sensors lm_sensors-libs | ||
+ | yum install zlib-devel bzip2-devel bzip bzip-devel | ||
+ | yum install openmpi openmpi-devel perl-ExtUtils-MakeMaker | ||
+ | yum install cmake cmake-devel | ||
+ | yum install libjpeg libjpeg-devel libjpeg-turbo-devel | ||
+ | yum update -y | ||
yum clean all | yum clean all | ||
+ | # remove internet, bring private back up | ||
+ | ifdown enp1s0f0 | ||
+ | vi ifcfg-enp1s0f0 | ||
+ | ifup enp1s0f0 | ||
+ | # passwd, shadow, group, hosts, fstab | ||
+ | mkdir /homeextra1 /homeextra2 /home33 /mindstore | ||
+ | cd /etc/ | ||
+ | # backup files to -orig versions | ||
+ | scp 192.168.102.89:/ | ||
+ | scp 10.10.102.89:/ | ||
+ | vi /etc/fstab | ||
+ | mount -a; df -h | ||
+ | # pick the kernel vendor used for now | ||
+ | grep ^menuentry / | ||
+ | grub2-set-default 1 | ||
+ | ls -d / | ||
+ | grub2-mkconfig -o / | ||
+ | # | ||
+ | # old level 3 | ||
+ | systemctl set-default multi-user.target | ||
+ | reboot | ||
+ | # switch to VGA | ||
+ | cd / | ||
+ | tar zxf n37.chroot-keep.ul.tar.gz | ||
+ | cd usr/local/ | ||
+ | mv amber16/ | ||
+ | mv cuda-9.2/ / | ||
+ | cd / | ||
+ | rsync -vac 10.10.102.89:/ | ||
+ | # test scripts gpu-free, gpu-info, gpu-process | ||
+ | 0,1,2,3 | ||
+ | id, | ||
+ | 0, GeForce RTX 2080 SUPER, 25, 126 MiB, 7855 MiB, 0 %, 0 % | ||
+ | 1, GeForce RTX 2080 SUPER, 24, 11 MiB, 7971 MiB, 0 %, 0 % | ||
+ | 2, GeForce RTX 2080 SUPER, 23, 11 MiB, 7971 MiB, 0 %, 0 % | ||
+ | 3, GeForce RTX 2080 SUPER, 23, 11 MiB, 7971 MiB, 0 %, 0 % | ||
+ | gpu_name, gpu_bus_id, pid, process_name | ||
+ | GeForce RTX 2080 SUPER, 00000000: | ||
+ | # done | ||
</ | </ | ||
- | ==== WhatWeGot? | + | ==== What We Purchased |
* 12 nodes yielding a total of | * 12 nodes yielding a total of | ||
Line 31: | Line 119: | ||
* 288 cpu cores | * 288 cpu cores | ||
* 1,152 gb cpu mem | * 1,152 gb cpu mem | ||
+ | * ~20 Tflops (dpfp) | ||
* 48 gpus | * 48 gpus | ||
* 384 gpu mem | * 384 gpu mem | ||
Line 120: | Line 209: | ||
{{: | {{: | ||
{{: | {{: | ||
+ | {{: | ||
+ | {{: | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |