This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:193 [2020/03/02 16:14] hmeij07 |
cluster:193 [2020/03/10 08:07] hmeij07 |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ==== Docker Usage ==== | + | ===== Docker |
- | === Simple Runs === | + | Page build up from the bottom to top. We're not making a traditional " |
+ | |||
+ | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. | ||
+ | |||
+ | ==== Readings ==== | ||
+ | |||
+ | Interesting reads... | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | ==== Scheduler Runs ==== | ||
+ | |||
+ | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' | ||
+ | |||
+ | < | ||
+ | |||
+ | # | ||
+ | # submit via 'bsub < run.docker' | ||
+ | rm -f out err | ||
+ | #BSUB -e err | ||
+ | #BSUB -o out | ||
+ | #BSUB -q exx96 | ||
+ | #BSUB -J " | ||
+ | |||
+ | #BSUB -n 1 | ||
+ | #BSUB -R " | ||
+ | |||
+ | # should add a check we get an integer back in the 0-3 range | ||
+ | gpuid=" | ||
+ | echo ""; | ||
+ | |||
+ | NV_GPU=$gpuid \ | ||
+ | nvidia-docker run --rm -u $(id -u):$(id -g) \ | ||
+ | -v / | ||
+ | -v / | ||
+ | -v / | ||
+ | nvcr.io/ | ||
+ | / | ||
+ | --num_gpus=1 --batch_size=64 \ | ||
+ | --model=resnet50 \ | ||
+ | --variable_update=parameter_server | ||
+ | # or run_tests.py | ||
+ | |||
+ | </ | ||
+ | |||
+ | To make the '' | ||
+ | |||
+ | < | ||
+ | |||
+ | import sys | ||
+ | sys.path.insert(0, | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== GPU Runs ==== | ||
+ | |||
+ | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' | ||
+ | |||
+ | < | ||
+ | |||
+ | # execute script | ||
+ | [hmeij@n79 docker]$ ./ | ||
+ | |||
+ | running on gpu n79:3 < | ||
+ | |||
+ | # tensorflow starts up | ||
+ | ================ | ||
+ | == TensorFlow == | ||
+ | ================ | ||
+ | |||
+ | NVIDIA Release 19.09 (build 8044706) | ||
+ | TensorFlow Version 1.14.0 | ||
+ | |||
+ | Container image Copyright (c) 2019, NVIDIA CORPORATION. | ||
+ | Copyright 2017-2019 The TensorFlow Authors. | ||
+ | (snip output...) | ||
+ | |||
+ | # details | ||
+ | TensorFlow: | ||
+ | Model: | ||
+ | Dataset: | ||
+ | Mode: training | ||
+ | SingleSess: | ||
+ | Batch size: 64 global | ||
+ | 64 per device | ||
+ | Num batches: 100 | ||
+ | Num epochs: | ||
+ | Devices: | ||
+ | NUMA bind: | ||
+ | Data format: NCHW | ||
+ | Optimizer: | ||
+ | Variables: | ||
+ | ========== | ||
+ | Generating training model | ||
+ | Initializing graph | ||
+ | Running warm up | ||
+ | (snip output...) | ||
+ | |||
+ | # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | ||
+ | [root@n79 ~]# gpu-process | ||
+ | |||
+ | gpu_name, gpu_bus_id, pid, process_name | ||
+ | GeForce RTX 2080 SUPER, 00000000: | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Simple Runs ==== | ||
+ | |||
+ | Some simple interactive test runs. Map your home directory from the host inside the container, I choose /mnt but it can go anywhere but /home ... Also set up your uid/gid because the container will run as " | ||
< | < | ||
Line 11: | Line 123: | ||
[hmeij@n79 ~]$ nvidia-docker run --rm -v / | [hmeij@n79 ~]$ nvidia-docker run --rm -v / | ||
-u $(id -u):$(id -g) nvcr.io/ | -u $(id -u):$(id -g) nvcr.io/ | ||
+ | |||
uid=8216 gid=623 groups=623 | uid=8216 gid=623 groups=623 | ||
Line 16: | Line 129: | ||
[hmeij@n79 ~]$ nvidia-docker run --rm -v / | [hmeij@n79 ~]$ nvidia-docker run --rm -v / | ||
-u $(id -u):$(id -g) nvcr.io/ | -u $(id -u):$(id -g) nvcr.io/ | ||
+ | |||
Filesystem | Filesystem | ||
10.10.102.42:/ | 10.10.102.42:/ | ||
Line 24: | Line 138: | ||
touch / | touch / | ||
- | # check permissions on container | + | # check permissions on host running the container |
[hmeij@n79 ~]$ ls -l $HOME/tmp | [hmeij@n79 ~]$ ls -l $HOME/tmp | ||
total 232 | total 232 | ||
Line 31: | Line 145: | ||
</ | </ | ||
- | === Pull Images === | + | ==== Pull Images ==== |
+ | |||
+ | Pull more images from the Nvidia Gpu Cloud Catalog. | ||
< | < | ||
Line 67: | Line 183: | ||
</ | </ | ||
- | === JupyterLab === | + | ==== JupyterLab |
The Rapids container and Notebook Server hide in the '' | The Rapids container and Notebook Server hide in the '' | ||
Line 97: | Line 213: | ||
- | === Digits === | + | ==== Digits |
Line 107: | Line 223: | ||
DATE=$( date +%N ) # nanoseconds unique id | DATE=$( date +%N ) # nanoseconds unique id | ||
- | docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=" | + | docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=" |
+ | --name digits-$DATE-0 -d -p 5000: | ||
+ | -v / | ||
+ | nvcr.io/ | ||
Line 138: | Line 257: | ||
{{: | {{: | ||
- | === Portainer === | + | ==== Portainer |
Portainer is a simple management solution for Docker which allows browser access on http:// | Portainer is a simple management solution for Docker which allows browser access on http:// | ||
Line 145: | Line 264: | ||
DATE=$(date +%N) # nanoseconds as unique id | DATE=$(date +%N) # nanoseconds as unique id | ||
- | docker run --name portainer-$DATE-0 -d -p 9000:9000 -v "/ | + | docker run --name portainer-$DATE-0 -d -p 9000: |
+ | -v "/ | ||
+ | portainer/ | ||
</ | </ | ||
Line 154: | Line 275: | ||
- | === What's Running? === | + | ==== What's Running? |
* Docker Version 19.03.5 | * Docker Version 19.03.5 | ||
* NVIDIA-Docker2 2.2.2-1 | * NVIDIA-Docker2 2.2.2-1 | ||
- | Video was " | + | Video was " |
< | < | ||
Line 199: | Line 320: | ||
- | === Setup === | + | ==== Setup ==== |
For a more detailed read on how to install Docker consult [[cluster: | For a more detailed read on how to install Docker consult [[cluster: |