User Tools

Site Tools


cluster:193

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:193 [2020/03/05 14:55]
hmeij07 [Scheduler Runs]
cluster:193 [2020/03/10 08:31] (current)
hmeij07 [Readings]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-===== Docker Usage =====+===== Docker Containers Usage =====
  
 Page build up from the bottom to top.  We're not making a traditional "MPI" docker integration with our scheduler.  We'll see what usage patterns will emergence and go from there. I can help with workflow.  If more containers are desired please let me know which ones to ''pull''. Page build up from the bottom to top.  We're not making a traditional "MPI" docker integration with our scheduler.  We'll see what usage patterns will emergence and go from there. I can help with workflow.  If more containers are desired please let me know which ones to ''pull''.
  
-If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed.  Then ssh to the node and access the application via ''firefox http://localhost:PORT/''. For example DIGITS and JupyterLab below.+If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed.  Then ssh to the node and access the application via ''firefox http://localhost:PORT/''. For example DIGITS and JupyterLab described below. 
 + 
 +==== Readings ==== 
 + 
 +Interesting reads... 
 + 
 +  * https://www.stackhpc.com/k8s-mpi.html 
 +    * PMI(x), Slurm 
 + 
 +  * https://www.stackhpc.com/the-state-of-hpc-containers.html 
 +    * Docker, Kubernetes, Singularity, Shifter, CharleiCloud
    
 +  * https://en.wikipedia.org/wiki/HAProxy
 +    * HA load balancing with Docker images for CentOS
 ==== Scheduler Runs ==== ==== Scheduler Runs ====
  
Line 40: Line 52:
 --variable_update=parameter_server --variable_update=parameter_server
 # or run_tests.py # or run_tests.py
 +
 +</code>
 +
 +To make the ''imports'' work edit that python file
 +
 +<code>
 +
 +import sys
 +sys.path.insert(0, '/mnt/hmeij/jobs/docker/benchmarks-master/scripts/tf_cnn_benchmarks/')
  
 </code> </code>
Line 64: Line 85:
 Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved. Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
 Copyright 2017-2019 The TensorFlow Authors.  All rights reserved. Copyright 2017-2019 The TensorFlow Authors.  All rights reserved.
-(deleted content...)+(snip output...)
  
 # details # details
Line 85: Line 106:
 Initializing graph Initializing graph
 Running warm up Running warm up
-(deleted content...it crashes but we can see it running)+(snip output...)
  
 # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify) # query what is running on gpus ... D8 is gpu 3 (ssh n79 nvidia-smi to verify)
Line 130: Line 151:
 ==== Pull Images ==== ==== Pull Images ====
  
-Pull more images from the Nvidia Gpu Cloud Catalog.  There are also models.+Pull more images from the Nvidia Gpu Cloud Catalog.  There are also models.  As you can tell, not all containers applications are up to date.  Only pulled on node ''n79'', not expecting any usage.  It is nice to pull esoteric software like the deep learning stack (digits, tensorflow, pytorch, caffe, rapidsai).
  
 <code> <code>
cluster/193.1583438113.txt.gz · Last modified: 2020/03/05 14:55 by hmeij07