cluster:193
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:193 [2020/03/05 18:59] – [Pull Images] hmeij07 | cluster:193 [2024/09/17 16:51] (current) – [What's Running?] hmeij07 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| **[[cluster: | **[[cluster: | ||
| - | ===== Docker Usage ===== | + | ===== Docker |
| Page build up from the bottom to top. We're not making a traditional " | Page build up from the bottom to top. We're not making a traditional " | ||
| - | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. | + | If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed. |
| + | |||
| + | ==== Readings ==== | ||
| + | |||
| + | Interesting reads... | ||
| + | |||
| + | * https:// | ||
| + | * PMI(x), Slurm | ||
| + | |||
| + | * https:// | ||
| + | * Docker, Kubernetes, Singularity, | ||
| + | * https:// | ||
| + | * HA load balancing with Docker images for CentOS | ||
| ==== Scheduler Runs ==== | ==== Scheduler Runs ==== | ||
| - | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' | + | Next add to the script the scheduler syntax that is needed. Request a gpu resource '' |
| < | < | ||
| Line 25: | Line 37: | ||
| #BSUB -R " | #BSUB -R " | ||
| + | # should add a check we get an integer back in the 0-3 range | ||
| gpuid=" | gpuid=" | ||
| echo ""; | echo ""; | ||
| Line 38: | Line 51: | ||
| --model=resnet50 \ | --model=resnet50 \ | ||
| --variable_update=parameter_server | --variable_update=parameter_server | ||
| + | # or run_tests.py | ||
| + | |||
| + | </ | ||
| + | |||
| + | To make the '' | ||
| + | |||
| + | < | ||
| + | import sys | ||
| + | sys.path.insert(0, | ||
| </ | </ | ||
| Line 44: | Line 66: | ||
| ==== GPU Runs ==== | ==== GPU Runs ==== | ||
| - | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' | + | We put the tensorflow benchmark example in a script. It will find a free gpu, set it in a environment variable '' |
| < | < | ||
| Line 63: | Line 85: | ||
| Container image Copyright (c) 2019, NVIDIA CORPORATION. | Container image Copyright (c) 2019, NVIDIA CORPORATION. | ||
| Copyright 2017-2019 The TensorFlow Authors. | Copyright 2017-2019 The TensorFlow Authors. | ||
| - | (deleted content...) | + | (snip output...) |
| # details | # details | ||
| Line 84: | Line 106: | ||
| Initializing graph | Initializing graph | ||
| Running warm up | Running warm up | ||
| - | (deleted content...it crashes but we can see it running) | + | (snip output...) |
| # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) | ||
| Line 129: | Line 151: | ||
| ==== Pull Images ==== | ==== Pull Images ==== | ||
| - | Pull more images from the Nvidia Gpu Cloud Catalog. | + | Pull more images from the Nvidia Gpu Cloud Catalog. |
| < | < | ||
| Line 279: | Line 301: | ||
| nvcr.io/ | nvcr.io/ | ||
| nvcr.io/ | nvcr.io/ | ||
| + | |||
| + | adding 09/17/2024 | ||
| + | https:// | ||
| + | docker pull mobigroup/ | ||
| + | Status: Downloaded newer image for mobigroup/ | ||
| + | docker.io/ | ||
| + | |||
| # running containers (persistent across boot events) | # running containers (persistent across boot events) | ||
cluster/193.1583434756.txt.gz · Last modified: by hmeij07
