Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:192 [DokuWiki]

User Tools

Site Tools


cluster:192

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:192 [2020/02/27 10:53]
hmeij07 [Recipe]
cluster:192 [2020/04/03 09:25]
hmeij07 [EXX96]
Line 10: Line 10:
 The Usage section below is HPCC users wnatig to use queue ''exx96''. The Usage section below is HPCC users wnatig to use queue ''exx96''.
  
 +Debug for node n89 which turns itself off...grrhhh. Create a usb bootable stick with https://rufus.ie/ then unzip BIOS and firmware zip files located in ''n89:/usr/local/src''
  
 +<code>
 +
 +[root@n89 ~]# ipmitool sel elist
 +   1 | 02/29/2020 | 16:57:33 | Memory #0xd1 | Uncorrectable ECC | Asserted
 +   2 | 03/02/2020 | 03:02:42 | Processor CPU_CATERR | IERR | Asserted
 +   3 | 03/11/2020 | 19:27:35 | Processor CPU_CATERR | IERR | Asserted
 +...[snip]...
 +
 +[root@n89 ~]# ipmitool sdr elist
 +CPU1 Temperature | 31h | ok  |  3.0 | 43 degrees C
 +CPU2 Temperature | 32h | ok  |  0.0 | 40 degrees C
 +PSU1 Over Temp   | 92h | ok  |  0.0 | Transition to OK
 +PSU2 Over Temp   | 9Ah | ok  |  0.0 | Transition to OK
 +...[snip]...
 +DIMMM1_Temp      | E4h | ok  |  3.0 | 28 degrees C
 +CPU1_ECC1        | D1h | ok  |  0.0 | Presence Detected
 +CPU2_ECC1        | D3h | ok  |  0.0 | Presence Detected
 +...[snip]...
 +PMBPower1        | E1h | ok  |  3.0 | 88 Watts
 +PMBPower2        | E2h | ok  |  3.0 | 112 Watts
 +...[snip]...
 +FRNT_FAN1        | A2h | ok  |  0.0 | 3100 RPM
 +../.[snip]...
 +PSU1 Slow FAN1   | 95h | ok  |  0.0 | Transition to OK
 +PSU2 Slow FAN1   | 9Dh | ok  |  0.0 | Transition to OK
 +...[snip]...
 +
 +
 +[root@n89 ~]#dmidecode -t0
 +# dmidecode 3.2
 +Getting SMBIOS data from sysfs.
 +SMBIOS 3.2 present.
 +
 +Handle 0x0000, DMI type 0, 26 bytes
 +BIOS Information
 +        Vendor: American Megatrends Inc.
 +        Version: 5102
 +        Release Date: 02/11/2019
 +        Address: 0xF0000
 +        Runtime Size: 64 kB
 +        ROM Size: 32 MB
 +        Characteristics:
 +...[snip]...
 +                UEFI is supported
 +        BIOS Revision: 5.14
 +
 +
 +[root@n89 ~]# edac-util -s -v
 +edac-util: EDAC drivers are loaded. 4 MCs detected:
 +  mc0:Skylake Socket#0 IMC#0
 +  mc1:Skylake Socket#0 IMC#1
 +  mc2:Skylake Socket#1 IMC#0
 +  mc3:Skylake Socket#1 IMC#1
 +[root@n89 ~]# edac-util
 +edac-util: No errors to report.
 +
 +syslog
 +
 +</code>
 ==== Usage ==== ==== Usage ====
  
Line 157: Line 217:
 # add packages and update # add packages and update
 yum install epel-release -y yum install epel-release -y
 +yum install flex flex-devel bison bison-devel -y 
 yum install tcl tcl-devel dmtcp -y yum install tcl tcl-devel dmtcp -y
 yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y
Line 237: Line 298:
 nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB
  
-free -g+free -m
               total        used        free      shared  buff/cache   available               total        used        free      shared  buff/cache   available
-Mem:             92                    88           0           1          89+Mem:          95056        1919       85338          20        7798       92571 
 +Swap:         10239           0       10239 
  
 # nvidia-smi # nvidia-smi
cluster/192.txt · Last modified: 2022/03/08 13:29 by hmeij07