User Tools

Site Tools


cluster:192

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:192 [2020/02/27 10:47]
hmeij07
cluster:192 [2022/03/08 13:29]
hmeij07 [Recipe]
Line 10: Line 10:
 The Usage section below is HPCC users wnatig to use queue ''exx96''. The Usage section below is HPCC users wnatig to use queue ''exx96''.
  
 +Debug for node n89 which turns itself off...grrhhh. Create a usb bootable stick with https://rufus.ie/ then unzip BIOS and firmware zip files located in ''n89:/usr/local/src''
  
 +<code>
 +
 +[root@n89 ~]# ipmitool sel elist
 +   1 | 02/29/2020 | 16:57:33 | Memory #0xd1 | Uncorrectable ECC | Asserted
 +   2 | 03/02/2020 | 03:02:42 | Processor CPU_CATERR | IERR | Asserted
 +   3 | 03/11/2020 | 19:27:35 | Processor CPU_CATERR | IERR | Asserted
 +...[snip]...
 +
 +[root@n89 ~]# ipmitool sdr elist
 +CPU1 Temperature | 31h | ok  |  3.0 | 43 degrees C
 +CPU2 Temperature | 32h | ok  |  0.0 | 40 degrees C
 +PSU1 Over Temp   | 92h | ok  |  0.0 | Transition to OK
 +PSU2 Over Temp   | 9Ah | ok  |  0.0 | Transition to OK
 +...[snip]...
 +DIMMM1_Temp      | E4h | ok  |  3.0 | 28 degrees C
 +CPU1_ECC1        | D1h | ok  |  0.0 | Presence Detected
 +CPU2_ECC1        | D3h | ok  |  0.0 | Presence Detected
 +...[snip]...
 +PMBPower1        | E1h | ok  |  3.0 | 88 Watts
 +PMBPower2        | E2h | ok  |  3.0 | 112 Watts
 +...[snip]...
 +FRNT_FAN1        | A2h | ok  |  0.0 | 3100 RPM
 +../.[snip]...
 +PSU1 Slow FAN1   | 95h | ok  |  0.0 | Transition to OK
 +PSU2 Slow FAN1   | 9Dh | ok  |  0.0 | Transition to OK
 +...[snip]...
 +
 +
 +[root@n89 ~]#dmidecode -t0
 +# dmidecode 3.2
 +Getting SMBIOS data from sysfs.
 +SMBIOS 3.2 present.
 +
 +Handle 0x0000, DMI type 0, 26 bytes
 +BIOS Information
 +        Vendor: American Megatrends Inc.
 +        Version: 5102
 +        Release Date: 02/11/2019
 +        Address: 0xF0000
 +        Runtime Size: 64 kB
 +        ROM Size: 32 MB
 +        Characteristics:
 +...[snip]...
 +                UEFI is supported
 +        BIOS Revision: 5.14
 +
 +
 +[root@n89 ~]# edac-util -s -v
 +edac-util: EDAC drivers are loaded. 4 MCs detected:
 +  mc0:Skylake Socket#0 IMC#0
 +  mc1:Skylake Socket#0 IMC#1
 +  mc2:Skylake Socket#1 IMC#0
 +  mc3:Skylake Socket#1 IMC#1
 +[root@n89 ~]# edac-util
 +edac-util: No errors to report.
 +
 +syslog
 +
 +</code>
 ==== Usage ==== ==== Usage ====
  
Line 136: Line 196:
 systemctl restart network systemctl restart network
 dig google.com dig google.com
 +#centos7
 yum install -y iptables-services yum install -y iptables-services
 vi /etc/sysconfig/iptables vi /etc/sysconfig/iptables
Line 157: Line 218:
 # add packages and update # add packages and update
 yum install epel-release -y yum install epel-release -y
 +yum install flex flex-devel bison bison-devel -y 
 yum install tcl tcl-devel dmtcp -y yum install tcl tcl-devel dmtcp -y
 +yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y
 yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y
 yum install blas blas-devel lapack lapack-devel boost boost-devel -y yum install blas blas-devel lapack lapack-devel boost boost-devel -y
Line 165: Line 228:
 yum install cmake cmake-devel -y yum install cmake cmake-devel -y
 yum install libjpeg libjpeg-devel libjpeg-turbo-devel -y yum install libjpeg libjpeg-devel libjpeg-turbo-devel -y
 +# amber
 +yum -y install tcsh make \
 +               gcc gcc-gfortran gcc-c++ \
 +               which flex bison patch bc \
 +               libXt-devel libXext-devel \
 +               perl perl-ExtUtils-MakeMaker util-linux wget \
 +               bzip2 bzip2-devel zlib-devel tar 
 yum update -y yum update -y
 yum clean all yum clean all
Line 236: Line 306:
 nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB
  
-free -g+free -m
               total        used        free      shared  buff/cache   available               total        used        free      shared  buff/cache   available
-Mem:             92                    88           0           1          89+Mem:          95056        1919       85338          20        7798       92571 
 +Swap:         10239           0       10239 
  
 # nvidia-smi # nvidia-smi
cluster/192.txt · Last modified: 2022/03/08 13:29 by hmeij07