User Tools

Site Tools


cluster:192

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:192 [2020/02/27 15:46]
hmeij07 [Miscellaneous]
cluster:192 [2020/04/03 12:58]
hmeij07 [EXX96]
Line 10: Line 10:
 The Usage section below is HPCC users wnatig to use queue ''exx96''. The Usage section below is HPCC users wnatig to use queue ''exx96''.
  
 +Debug for node n89 which turns itself off...grrhhh
  
 +<code>
 +
 +ipmitool sel elist
 +ipmitool sdr elist
 +dmidecode -t0
 +edac-util
 +syslog
 +
 +</code>
 ==== Usage ==== ==== Usage ====
  
Line 81: Line 91:
 #/usr/bin/nvidia-smi --gom=0 #/usr/bin/nvidia-smi --gom=0
  
-# for amber16 -pm=ENABLED -c=EXCLUSIVE_PROCESS+# for amber16 -pm=1/ENABLED -c=1/EXCLUSIVE_PROCESS
 #nvidia-smi --persistence-mode=1 #nvidia-smi --persistence-mode=1
 #nvidia-smi --compute-mode=1 #nvidia-smi --compute-mode=1
  
-# for mwgpu/exx96 -pm=ENABLED -c=DEFAULT+# for mwgpu/exx96 -pm=1/ENABLED -c=0/DEFAULT
 # note: turned this off, running with defaults # note: turned this off, running with defaults
 +# seems stable, maybe persistence later on
 +# lets see how docker interacts first...
 #nvidia-smi --persistence-mode=1 #nvidia-smi --persistence-mode=1
 #nvidia-smi --compute-mode=0 #nvidia-smi --compute-mode=0
Line 155: Line 167:
 # add packages and update # add packages and update
 yum install epel-release -y yum install epel-release -y
 +yum install flex flex-devel bison bison-devel -y 
 yum install tcl tcl-devel dmtcp -y yum install tcl tcl-devel dmtcp -y
 +yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y
 yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y
 yum install blas blas-devel lapack lapack-devel boost boost-devel -y yum install blas blas-devel lapack lapack-devel boost boost-devel -y
Line 234: Line 248:
 nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB
  
-free -g+free -m
               total        used        free      shared  buff/cache   available               total        used        free      shared  buff/cache   available
-Mem:             92                    88           0           1          89+Mem:          95056        1919       85338          20        7798       92571 
 +Swap:         10239           0       10239 
  
 # nvidia-smi # nvidia-smi
cluster/192.txt · Last modified: 2022/03/08 18:29 by hmeij07