User Tools

Site Tools


cluster:192

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:192 [2020/02/27 10:53]
hmeij07 [Recipe]
cluster:192 [2022/03/08 13:29] (current)
hmeij07 [Recipe]
Line 10: Line 10:
 The Usage section below is HPCC users wnatig to use queue ''exx96''. The Usage section below is HPCC users wnatig to use queue ''exx96''.
  
 +Debug for node n89 which turns itself off...grrhhh. Create a usb bootable stick with https://rufus.ie/ then unzip BIOS and firmware zip files located in ''n89:/usr/local/src''
  
 +<code>
 +
 +[root@n89 ~]# ipmitool sel elist
 +   1 | 02/29/2020 | 16:57:33 | Memory #0xd1 | Uncorrectable ECC | Asserted
 +   2 | 03/02/2020 | 03:02:42 | Processor CPU_CATERR | IERR | Asserted
 +   3 | 03/11/2020 | 19:27:35 | Processor CPU_CATERR | IERR | Asserted
 +...[snip]...
 +
 +[root@n89 ~]# ipmitool sdr elist
 +CPU1 Temperature | 31h | ok  |  3.0 | 43 degrees C
 +CPU2 Temperature | 32h | ok  |  0.0 | 40 degrees C
 +PSU1 Over Temp   | 92h | ok  |  0.0 | Transition to OK
 +PSU2 Over Temp   | 9Ah | ok  |  0.0 | Transition to OK
 +...[snip]...
 +DIMMM1_Temp      | E4h | ok  |  3.0 | 28 degrees C
 +CPU1_ECC1        | D1h | ok  |  0.0 | Presence Detected
 +CPU2_ECC1        | D3h | ok  |  0.0 | Presence Detected
 +...[snip]...
 +PMBPower1        | E1h | ok  |  3.0 | 88 Watts
 +PMBPower2        | E2h | ok  |  3.0 | 112 Watts
 +...[snip]...
 +FRNT_FAN1        | A2h | ok  |  0.0 | 3100 RPM
 +../.[snip]...
 +PSU1 Slow FAN1   | 95h | ok  |  0.0 | Transition to OK
 +PSU2 Slow FAN1   | 9Dh | ok  |  0.0 | Transition to OK
 +...[snip]...
 +
 +
 +[root@n89 ~]#dmidecode -t0
 +# dmidecode 3.2
 +Getting SMBIOS data from sysfs.
 +SMBIOS 3.2 present.
 +
 +Handle 0x0000, DMI type 0, 26 bytes
 +BIOS Information
 +        Vendor: American Megatrends Inc.
 +        Version: 5102
 +        Release Date: 02/11/2019
 +        Address: 0xF0000
 +        Runtime Size: 64 kB
 +        ROM Size: 32 MB
 +        Characteristics:
 +...[snip]...
 +                UEFI is supported
 +        BIOS Revision: 5.14
 +
 +
 +[root@n89 ~]# edac-util -s -v
 +edac-util: EDAC drivers are loaded. 4 MCs detected:
 +  mc0:Skylake Socket#0 IMC#0
 +  mc1:Skylake Socket#0 IMC#1
 +  mc2:Skylake Socket#1 IMC#0
 +  mc3:Skylake Socket#1 IMC#1
 +[root@n89 ~]# edac-util
 +edac-util: No errors to report.
 +
 +syslog
 +
 +</code>
 ==== Usage ==== ==== Usage ====
  
Line 136: Line 196:
 systemctl restart network systemctl restart network
 dig google.com dig google.com
 +#centos7
 yum install -y iptables-services yum install -y iptables-services
 vi /etc/sysconfig/iptables vi /etc/sysconfig/iptables
Line 157: Line 218:
 # add packages and update # add packages and update
 yum install epel-release -y yum install epel-release -y
 +yum install flex flex-devel bison bison-devel -y 
 yum install tcl tcl-devel dmtcp -y yum install tcl tcl-devel dmtcp -y
 yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y
Line 166: Line 228:
 yum install cmake cmake-devel -y yum install cmake cmake-devel -y
 yum install libjpeg libjpeg-devel libjpeg-turbo-devel -y yum install libjpeg libjpeg-devel libjpeg-turbo-devel -y
 +# amber
 +yum -y install tcsh make \
 +               gcc gcc-gfortran gcc-c++ \
 +               which flex bison patch bc \
 +               libXt-devel libXext-devel \
 +               perl perl-ExtUtils-MakeMaker util-linux wget \
 +               bzip2 bzip2-devel zlib-devel tar 
 yum update -y yum update -y
 yum clean all yum clean all
Line 237: Line 306:
 nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB
  
-free -g+free -m
               total        used        free      shared  buff/cache   available               total        used        free      shared  buff/cache   available
-Mem:             92                    88           0           1          89+Mem:          95056        1919       85338          20        7798       92571 
 +Swap:         10239           0       10239 
  
 # nvidia-smi # nvidia-smi
cluster/192.1582818818.txt.gz · Last modified: 2020/02/27 10:53 by hmeij07