This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:172 [2018/08/22 11:50] hmeij07 [Nvidia] |
cluster:172 [2018/08/22 13:13] hmeij07 |
||
---|---|---|---|
Line 17: | Line 17: | ||
* copy passwd, shadow, group, hosts, fstab from global archive | * copy passwd, shadow, group, hosts, fstab from global archive | ||
* check polkit user ... screws up systemd-logind | * check polkit user ... screws up systemd-logind | ||
- | * connextX | + | * connectX |
+ | * unmount NFS mounts while installing nvidia as root | ||
+ | * install other software as regular user | ||
==== Nvidia ==== | ==== Nvidia ==== | ||
+ | |||
+ | ** Installation ** | ||
< | < | ||
Line 29: | Line 33: | ||
yum update kernel kernel-tools kernel-tools-libs | yum update kernel kernel-tools kernel-tools-libs | ||
yum install kernel-devel kernel-headers (remove old headers after reboot) | yum install kernel-devel kernel-headers (remove old headers after reboot) | ||
- | yum install gcc gcc-devel | + | yum install gcc gcc-devel |
# download runfiles from https:// | # download runfiles from https:// | ||
Line 69: | Line 73: | ||
[root@n37 src]# scp n78:/ | [root@n37 src]# scp n78:/ | ||
- | # Device files/ | + | # Device files/ |
# They were not | # They were not | ||
/ | / | ||
- | # new kernel initramfs: | + | # new kernel initramfs, load |
dracut --force | dracut --force | ||
reboot | reboot | ||
</ | </ | ||
+ | |||
+ | For the user environment | ||
+ | * export PATH=/ | ||
+ | * export LD_LIBRARY_PATH=/ | ||
+ | |||
+ | **Verification** | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@n37 cuda-9.2]# / | ||
+ | / | ||
+ | |||
+ | CUDA Device Query (Runtime API) version (CUDART static linking) | ||
+ | |||
+ | Detected 4 CUDA Capable device(s) | ||
+ | |||
+ | Device 0: "Tesla K20m" | ||
+ | CUDA Driver Version / Runtime Version | ||
+ | CUDA Capability Major/Minor version number: | ||
+ | ... | ||
+ | > Peer access from Tesla K20m (GPU0) -> Tesla K20m (GPU1) : Yes | ||
+ | > Peer access from Tesla K20m (GPU0) -> Tesla K20m (GPU2) : No | ||
+ | > Peer access from Tesla K20m (GPU0) -> Tesla K20m (GPU3) : No | ||
+ | > Peer access from Tesla K20m (GPU1) -> Tesla K20m (GPU0) : Yes | ||
+ | > Peer access from Tesla K20m (GPU1) -> Tesla K20m (GPU2) : No | ||
+ | > Peer access from Tesla K20m (GPU1) -> Tesla K20m (GPU3) : No | ||
+ | > Peer access from Tesla K20m (GPU2) -> Tesla K20m (GPU0) : No | ||
+ | > Peer access from Tesla K20m (GPU2) -> Tesla K20m (GPU1) : No | ||
+ | > Peer access from Tesla K20m (GPU2) -> Tesla K20m (GPU3) : Yes | ||
+ | > Peer access from Tesla K20m (GPU3) -> Tesla K20m (GPU0) : No | ||
+ | > Peer access from Tesla K20m (GPU3) -> Tesla K20m (GPU1) : No | ||
+ | > Peer access from Tesla K20m (GPU3) -> Tesla K20m (GPU2) : Yes | ||
+ | |||
+ | deviceQuery, | ||
+ | CUDA Runtime Version = 9.2, NumDevs = 4, | ||
+ | Device0 = Tesla K20m, Device1 = Tesla K20m, | ||
+ | Device2 = Tesla K20m, Device3 = Tesla K20m | ||
+ | Result = PASS | ||
+ | |||
+ | </ | ||
+ | |||
+ | ** BandWithTest ** | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@n37 cuda-9.2]# / | ||
+ | [CUDA Bandwidth Test] - Starting... | ||
+ | Running on... | ||
+ | |||
+ | | ||
+ | Quick Mode | ||
+ | |||
+ | Host to Device Bandwidth, 1 Device(s) | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | Result = PASS | ||
+ | |||
+ | </ | ||
+ | |||
+ | ** Finish ** | ||
+ | |||
+ | * yum install freeglut-devel libX11-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel | ||
+ | * check for / | ||
+ | * [root@n37 /]# tar -cvf / | ||
+ | * [root@n37 /]# scp / | ||
+ | |||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |