This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:193 [2020/03/05 18:42] hmeij07 |
cluster:193 [2024/09/17 16:51] (current) hmeij07 [What's Running?] |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ==== Docker Usage ==== | + | ===== Docker |
Page build up from the bottom to top. We're not making a traditional " | Page build up from the bottom to top. We're not making a traditional " | ||
+ | |||
+ | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. | ||
+ | |||
+ | ==== Readings ==== | ||
+ | |||
+ | Interesting reads... | ||
+ | |||
+ | * https:// | ||
+ | * PMI(x), Slurm | ||
+ | |||
+ | * https:// | ||
+ | * Docker, Kubernetes, Singularity, | ||
- | === Scheduler === | + | * https:// |
+ | * HA load balancing with Docker images for CentOS | ||
+ | ==== Scheduler | ||
- | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' | + | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' |
< | < | ||
Line 23: | Line 37: | ||
#BSUB -R " | #BSUB -R " | ||
+ | # should add a check we get an integer back in the 0-3 range | ||
gpuid=" | gpuid=" | ||
echo ""; | echo ""; | ||
Line 36: | Line 51: | ||
--model=resnet50 \ | --model=resnet50 \ | ||
--variable_update=parameter_server | --variable_update=parameter_server | ||
+ | # or run_tests.py | ||
+ | |||
+ | </ | ||
+ | |||
+ | To make the '' | ||
+ | |||
+ | < | ||
+ | import sys | ||
+ | sys.path.insert(0, | ||
</ | </ | ||
- | ==== GPU runs ==== | ||
- | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' | + | ==== GPU Runs ==== |
+ | |||
+ | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' | ||
< | < | ||
Line 60: | Line 85: | ||
Container image Copyright (c) 2019, NVIDIA CORPORATION. | Container image Copyright (c) 2019, NVIDIA CORPORATION. | ||
Copyright 2017-2019 The TensorFlow Authors. | Copyright 2017-2019 The TensorFlow Authors. | ||
- | (deleted content...) | + | (snip output...) |
# details | # details | ||
Line 81: | Line 106: | ||
Initializing graph | Initializing graph | ||
Running warm up | Running warm up | ||
- | (deleted content...it crashes but we can see it running) | + | (snip output...) |
# query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | ||
Line 93: | Line 118: | ||
- | + | ==== Simple Runs ==== | |
- | + | ||
- | + | ||
- | === Simple Runs === | + | |
Some simple interactive test runs. Map your home directory from the host inside the container, I choose /mnt but it can go anywhere but /home ... Also set up your uid/gid because the container will run as " | Some simple interactive test runs. Map your home directory from the host inside the container, I choose /mnt but it can go anywhere but /home ... Also set up your uid/gid because the container will run as " | ||
Line 127: | Line 149: | ||
</ | </ | ||
- | === Pull Images === | + | ==== Pull Images ==== |
+ | |||
+ | Pull more images from the Nvidia Gpu Cloud Catalog. | ||
< | < | ||
Line 163: | Line 187: | ||
</ | </ | ||
- | === JupyterLab === | + | ==== JupyterLab |
The Rapids container and Notebook Server hide in the '' | The Rapids container and Notebook Server hide in the '' | ||
Line 193: | Line 217: | ||
- | === Digits === | + | ==== Digits |
Line 203: | Line 227: | ||
DATE=$( date +%N ) # nanoseconds unique id | DATE=$( date +%N ) # nanoseconds unique id | ||
- | docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=" | + | docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=" |
+ | --name digits-$DATE-0 -d -p 5000: | ||
+ | -v / | ||
+ | nvcr.io/ | ||
Line 234: | Line 261: | ||
{{: | {{: | ||
- | === Portainer === | + | ==== Portainer |
Portainer is a simple management solution for Docker which allows browser access on http:// | Portainer is a simple management solution for Docker which allows browser access on http:// | ||
Line 241: | Line 268: | ||
DATE=$(date +%N) # nanoseconds as unique id | DATE=$(date +%N) # nanoseconds as unique id | ||
- | docker run --name portainer-$DATE-0 -d -p 9000:9000 -v "/ | + | docker run --name portainer-$DATE-0 -d -p 9000: |
+ | -v "/ | ||
+ | portainer/ | ||
</ | </ | ||
Line 250: | Line 279: | ||
- | === What's Running? === | + | ==== What's Running? |
* Docker Version 19.03.5 | * Docker Version 19.03.5 | ||
* NVIDIA-Docker2 2.2.2-1 | * NVIDIA-Docker2 2.2.2-1 | ||
- | Video was " | + | Video was " |
< | < | ||
Line 272: | Line 301: | ||
nvcr.io/ | nvcr.io/ | ||
nvcr.io/ | nvcr.io/ | ||
+ | |||
+ | adding 09/17/2024 | ||
+ | https:// | ||
+ | docker pull mobigroup/ | ||
+ | Status: Downloaded newer image for mobigroup/ | ||
+ | docker.io/ | ||
+ | |||
# running containers (persistent across boot events) | # running containers (persistent across boot events) | ||
Line 295: | Line 331: | ||
- | === Setup === | + | ==== Setup ==== |
For a more detailed read on how to install Docker consult [[cluster: | For a more detailed read on how to install Docker consult [[cluster: |