This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:193 [2020/03/05 18:55] hmeij07 [Portainer] |
cluster:193 [2024/09/17 16:51] (current) hmeij07 [What's Running?] |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ===== Docker Usage ===== | + | ===== Docker |
Page build up from the bottom to top. We're not making a traditional " | Page build up from the bottom to top. We're not making a traditional " | ||
- | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. | + | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. |
+ | |||
+ | ==== Readings ==== | ||
+ | |||
+ | Interesting reads... | ||
+ | |||
+ | * https:// | ||
+ | * PMI(x), Slurm | ||
+ | |||
+ | * https:// | ||
+ | * Docker, Kubernetes, Singularity, | ||
+ | * https:// | ||
+ | * HA load balancing with Docker images for CentOS | ||
==== Scheduler Runs ==== | ==== Scheduler Runs ==== | ||
- | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' | + | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' |
< | < | ||
Line 25: | Line 37: | ||
#BSUB -R " | #BSUB -R " | ||
+ | # should add a check we get an integer back in the 0-3 range | ||
gpuid=" | gpuid=" | ||
echo ""; | echo ""; | ||
Line 38: | Line 51: | ||
--model=resnet50 \ | --model=resnet50 \ | ||
--variable_update=parameter_server | --variable_update=parameter_server | ||
+ | # or run_tests.py | ||
+ | |||
+ | </ | ||
+ | |||
+ | To make the '' | ||
+ | |||
+ | < | ||
+ | import sys | ||
+ | sys.path.insert(0, | ||
</ | </ | ||
Line 44: | Line 66: | ||
==== GPU Runs ==== | ==== GPU Runs ==== | ||
- | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' | + | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' |
< | < | ||
Line 63: | Line 85: | ||
Container image Copyright (c) 2019, NVIDIA CORPORATION. | Container image Copyright (c) 2019, NVIDIA CORPORATION. | ||
Copyright 2017-2019 The TensorFlow Authors. | Copyright 2017-2019 The TensorFlow Authors. | ||
- | (deleted content...) | + | (snip output...) |
# details | # details | ||
Line 84: | Line 106: | ||
Initializing graph | Initializing graph | ||
Running warm up | Running warm up | ||
- | (deleted content...it crashes but we can see it running) | + | (snip output...) |
# query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | ||
Line 128: | Line 150: | ||
==== Pull Images ==== | ==== Pull Images ==== | ||
+ | |||
+ | Pull more images from the Nvidia Gpu Cloud Catalog. | ||
< | < | ||
Line 203: | Line 227: | ||
DATE=$( date +%N ) # nanoseconds unique id | DATE=$( date +%N ) # nanoseconds unique id | ||
- | docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=" | + | docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=" |
+ | --name digits-$DATE-0 -d -p 5000: | ||
+ | -v / | ||
+ | nvcr.io/ | ||
Line 274: | Line 301: | ||
nvcr.io/ | nvcr.io/ | ||
nvcr.io/ | nvcr.io/ | ||
+ | |||
+ | adding 09/17/2024 | ||
+ | https:// | ||
+ | docker pull mobigroup/ | ||
+ | Status: Downloaded newer image for mobigroup/ | ||
+ | docker.io/ | ||
+ | |||
# running containers (persistent across boot events) | # running containers (persistent across boot events) |