This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:193 [2020/03/05 19:55] hmeij07 [Scheduler Runs] |
cluster:193 [2024/09/17 16:51] (current) hmeij07 [What's Running?] |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ===== Docker Usage ===== | + | ===== Docker |
Page build up from the bottom to top. We're not making a traditional " | Page build up from the bottom to top. We're not making a traditional " | ||
- | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. | + | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. |
+ | |||
+ | ==== Readings ==== | ||
+ | |||
+ | Interesting reads... | ||
+ | |||
+ | * https:// | ||
+ | * PMI(x), Slurm | ||
+ | |||
+ | * https:// | ||
+ | * Docker, Kubernetes, Singularity, | ||
+ | * https:// | ||
+ | * HA load balancing with Docker images for CentOS | ||
==== Scheduler Runs ==== | ==== Scheduler Runs ==== | ||
Line 40: | Line 52: | ||
--variable_update=parameter_server | --variable_update=parameter_server | ||
# or run_tests.py | # or run_tests.py | ||
+ | |||
+ | </ | ||
+ | |||
+ | To make the '' | ||
+ | |||
+ | < | ||
+ | |||
+ | import sys | ||
+ | sys.path.insert(0, | ||
</ | </ | ||
Line 64: | Line 85: | ||
Container image Copyright (c) 2019, NVIDIA CORPORATION. | Container image Copyright (c) 2019, NVIDIA CORPORATION. | ||
Copyright 2017-2019 The TensorFlow Authors. | Copyright 2017-2019 The TensorFlow Authors. | ||
- | (deleted content...) | + | (snip output...) |
# details | # details | ||
Line 85: | Line 106: | ||
Initializing graph | Initializing graph | ||
Running warm up | Running warm up | ||
- | (deleted content...it crashes but we can see it running) | + | (snip output...) |
# query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | ||
Line 130: | Line 151: | ||
==== Pull Images ==== | ==== Pull Images ==== | ||
- | Pull more images from the Nvidia Gpu Cloud Catalog. | + | Pull more images from the Nvidia Gpu Cloud Catalog. |
< | < | ||
Line 280: | Line 301: | ||
nvcr.io/ | nvcr.io/ | ||
nvcr.io/ | nvcr.io/ | ||
+ | |||
+ | adding 09/17/2024 | ||
+ | https:// | ||
+ | docker pull mobigroup/ | ||
+ | Status: Downloaded newer image for mobigroup/ | ||
+ | docker.io/ | ||
+ | |||
# running containers (persistent across boot events) | # running containers (persistent across boot events) |