This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:192 [2020/02/26 18:25] hmeij07 [EXX96] |
cluster:192 [2020/02/27 15:47] hmeij07 |
||
---|---|---|---|
Line 4: | Line 4: | ||
===== EXX96 ===== | ===== EXX96 ===== | ||
- | A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which as the test node in redoing our K20 nodes, see [[cluster: | + | A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which was the test node in redoing our K20 nodes, see [[cluster: |
+ | |||
+ | Page best followed bottom to top if interested in the whole process. | ||
+ | |||
+ | The Usage section below is HPCC users wnatig to use queue '' | ||
- | Page best followed bottom to top. | ||
==== Usage ==== | ==== Usage ==== | ||
- | The new queue '' | + | The new queue '' |
+ | A new static resource is introduced for all nodes holding gpus. '' | ||
+ | The wrappers provided assume your cpu:gpu ratio is 1:1 hence in your submit code you will have ''# | ||
+ | |||
+ | The wrappers (n78.mpich3.wrapper for '' | ||
+ | |||
< | < | ||
+ | |||
+ | # command that shows gpu reservations | ||
bhosts -l n79 | bhosts -l n79 | ||
| | ||
| | ||
- | | + | |
+ | # old way of doing that | ||
lsload -l n79 | lsload -l n79 | ||
Line 25: | Line 36: | ||
n79 | n79 | ||
- | mdout.325288: | + | </ |
- | mdout.325289: | + | |
- | mdout.326208: | + | |
- | #BSUB -n 4 | + | Peer to peer communication is possible (via PCIe rather than NVlink) with this hardware. |
- | #BSUB -R " | + | |
- | export CPU_GPU_REQUEST=4: | + | |
- | </ | + | < |
+ | cpu:gpu | ||
+ | mdout.325288: | ||
+ | mdout.325289: | ||
+ | mdout.326208: | ||
+ | |||
+ | </ | ||
==== Miscellaneous ==== | ==== Miscellaneous ==== | ||
Line 68: | Line 81: | ||
#/ | #/ | ||
- | # for amber16 -pm=ENABLED -c=EXCLUSIVE_PROCESS | + | # for amber16 -pm=1/ENABLED -c=1/EXCLUSIVE_PROCESS |
#nvidia-smi --persistence-mode=1 | #nvidia-smi --persistence-mode=1 | ||
#nvidia-smi --compute-mode=1 | #nvidia-smi --compute-mode=1 | ||
- | # for mwgpu/exx96 -pm=ENABLED -c=DEFAULT | + | # for mwgpu/exx96 -pm=1/ENABLED -c=0/DEFAULT |
- | nvidia-smi --persistence-mode=1 | + | # note: turned this off, running with defaults |
- | nvidia-smi --compute-mode=0 | + | # seems stable, maybe persistence later on |
+ | # lets see how docker interacts first... | ||
+ | #nvidia-smi --persistence-mode=1 | ||
+ | #nvidia-smi --compute-mode=0 | ||
# turn ECC off (memory scrubbing) | # turn ECC off (memory scrubbing) |