User Tools

Site Tools


cluster:225

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:225 [2024/04/15 09:05]
hmeij07
cluster:225 [2024/05/13 14:30] (current)
hmeij07
Line 73: Line 73:
 # this is a warning mostly by may interfere as does docker likely # this is a warning mostly by may interfere as does docker likely
 https://unix.stackexchange.com/questions/118116/what-is-a-tainted-linux-kernel https://unix.stackexchange.com/questions/118116/what-is-a-tainted-linux-kernel
 +# disabled docker on n79 for now 04/15/2024 9:06AM
 +# also rotated the memory dimms some time later, seems to have fixed issue
 +# started docker back up on n79 05/06/2024 9:56AM (has been up 17 days by now)
  
 # n89 next (no problem) # n89 next (no problem)
 # but upon reboot I encountered that error for the FIRST time on this node # but upon reboot I encountered that error for the FIRST time on this node
-# need to research it is not related to cuda install +# need to research it is somewhat related to cuda install 
 +# n80 (same error upon reboot after driver install) 
 +# n81 (same error upon reboot after driver install) 
 +# n90 (same error upon reboot after toolkit install, not driver. weird) 
 +# n88 (failed toolkit install, ran /usr/bin/ndia-uninstall, reboot 
 +#      re-installed driver, reboot, re-installed tookit, reboot,  
 +#      no error occurs! )
  
 sh ./NVIDIA-Linux-x86_64-550.67.run sh ./NVIDIA-Linux-x86_64-550.67.run
Line 129: Line 137:
  
 REBOOT and check date before launching slurm REBOOT and check date before launching slurm
 +mv /var/spool/slurmd/cred_state /var/spool/slurmd/cred_state.bak
  
 =========== ===========
Line 228: Line 237:
  
 ** CentOS 7 on n89 ** ** CentOS 7 on n89 **
 +
 +The steps above can also be done for the default cuda installation on exx96 where the soft link ''/usr/local/bin/cuda'' would have pointed to ''/usr/local/bin/cuda-10.2''. Do not follow the soft link and use the path with the toolkit version in it when setting your cuda environment.
  
 Next test is to see if older software runs compatible with newer drivers. We test that by running a gpu program against new 550 driver and cuda toolkit 9.2 and see if it works (~hmeij/slurm/run.centos7.2). Next test is to see if older software runs compatible with newer drivers. We test that by running a gpu program against new 550 driver and cuda toolkit 9.2 and see if it works (~hmeij/slurm/run.centos7.2).
cluster/225.1713186343.txt.gz · Last modified: 2024/04/15 09:05 by hmeij07