User Tools

Site Tools


cluster:193

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:193 [2020/03/05 19:59]
hmeij07 [Scheduler Runs]
cluster:193 [2024/09/17 16:51] (current)
hmeij07 [What's Running?]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-===== Docker Usage =====+===== Docker Containers Usage =====
  
 Page build up from the bottom to top.  We're not making a traditional "MPI" docker integration with our scheduler.  We'll see what usage patterns will emergence and go from there. I can help with workflow.  If more containers are desired please let me know which ones to ''pull''. Page build up from the bottom to top.  We're not making a traditional "MPI" docker integration with our scheduler.  We'll see what usage patterns will emergence and go from there. I can help with workflow.  If more containers are desired please let me know which ones to ''pull''.
  
-If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed.  Then ssh to the node and access the application via ''firefox http://localhost:PORT/''. For example DIGITS and JupyterLab below.+If users want to run web enabled applications in the container one simple workflow would be to submit a job that reserves a GPU then loops checking a lock file until removed.  Then ssh to the node and access the application via ''firefox http://localhost:PORT/''. For example DIGITS and JupyterLab described below. 
 + 
 +==== Readings ==== 
 + 
 +Interesting reads... 
 + 
 +  * https://www.stackhpc.com/k8s-mpi.html 
 +    * PMI(x), Slurm 
 + 
 +  * https://www.stackhpc.com/the-state-of-hpc-containers.html 
 +    * Docker, Kubernetes, Singularity, Shifter, CharleiCloud
    
 +  * https://en.wikipedia.org/wiki/HAProxy
 +    * HA load balancing with Docker images for CentOS
 ==== Scheduler Runs ==== ==== Scheduler Runs ====
  
Line 139: Line 151:
 ==== Pull Images ==== ==== Pull Images ====
  
-Pull more images from the Nvidia Gpu Cloud Catalog.  There are also models.+Pull more images from the Nvidia Gpu Cloud Catalog.  There are also models.  As you can tell, not all containers applications are up to date.  Only pulled on node ''n79'', not expecting any usage.  It is nice to pull esoteric software like the deep learning stack (digits, tensorflow, pytorch, caffe, rapidsai).
  
 <code> <code>
Line 289: Line 301:
 nvcr.io/nvidia/caffe               19.09-py2                      b52fbbef7e6b        6 months ago        5.15GB nvcr.io/nvidia/caffe               19.09-py2                      b52fbbef7e6b        6 months ago        5.15GB
 nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        6 months ago        5.84GB nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        6 months ago        5.84GB
 +
 +adding 09/17/2024
 +https://hub.docker.com/r/mobigroup/pygmtsar-large
 +docker pull mobigroup/pygmtsar-large
 +Status: Downloaded newer image for mobigroup/pygmtsar-large:latest
 +docker.io/mobigroup/pygmtsar-large:latest
 +
  
 # running containers (persistent across boot events) # running containers (persistent across boot events)
cluster/193.1583438388.txt.gz · Last modified: 2020/03/05 19:59 by hmeij07