User Tools

Site Tools


cluster:192

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:192 [2020/02/26 19:46]
hmeij07 [Usage]
cluster:192 [2020/03/30 13:54]
hmeij07
Line 4: Line 4:
 ===== EXX96 ===== ===== EXX96 =====
  
-A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which as the test node in redoing our K20 nodes, see [[cluster:172|K20 Redo]]+A page for me on how these 12 nodes were build up after they arrived. To make them "ala n37" which was the test node in redoing our K20 nodes, see [[cluster:172|K20 Redo]] and [[cluster:173|K20 Redo Usage]] 
 + 
 +Page best followed bottom to top if interested in the whole process. 
 + 
 +The Usage section below is HPCC users wnatig to use queue ''exx96''.
  
-Page best followed bottom to top. 
  
 ==== Usage ==== ==== Usage ====
Line 14: Line 17:
 A new static resource is introduced for all nodes holding gpus. ''n78'' in queue ''amber128'' and ''n33-n37'' in queue ''mwgpu'' and the nodes mentioned above.  The name of this resource is ''gpu4'' Moving forward please use it instead of ''gpu'' or ''gputest''. A new static resource is introduced for all nodes holding gpus. ''n78'' in queue ''amber128'' and ''n33-n37'' in queue ''mwgpu'' and the nodes mentioned above.  The name of this resource is ''gpu4'' Moving forward please use it instead of ''gpu'' or ''gputest''.
  
-The wrappers provided assume your cpu:gpu ratio is 1:1 hence in your submit code you will have ''#BSUB -n 1'' and in your resource allocation line ''gpu4=1'' If your ratio is something else you can set CPU_GPU_REQUEST. For example CPU_GPU_REQUEST=4:2 expects the lines ''#BSUB -n 4'' and ''gpu4=2'' in your submit script. SAmple script at ''/home/hmeij/k20redo/run.rtx''+The wrappers provided assume your cpu:gpu ratio is 1:1 hence in your submit code you will have ''#BSUB -n 1'' and in your resource allocation line ''gpu4=1'' If your ratio is something else you can set CPU_GPU_REQUEST. For example CPU_GPU_REQUEST=4:2 expects the lines ''#BSUB -n 4'' and ''gpu4=2'' in your submit script. Sample script at ''/home/hmeij/k20redo/run.rtx''
  
-The wrappers (78.mpich3.wrapper for ''n78'', and n37.openmpi.wrapper for all others) are located in ''/usr/local/bin'' and will set up your environment and start either of these applications: amber, lammps, gromacs, matlab and namd from ''/usr/local''.+The wrappers (n78.mpich3.wrapper for ''n78'', and n37.openmpi.wrapper for all others) are located in ''/usr/local/bin'' and will set up your environment and start either of these applications: amber, lammps, gromacs, matlab and namd from ''/usr/local''.
    
  
Line 25: Line 28:
              gputest gpu4              gputest gpu4
  Total                3  Total                3
- Reserved        0.0  0.1+ Reserved        0.0  1.0
  
 # old way of doing that # old way of doing that
Line 34: Line 37:
  
 </code> </code>
 +
 +Peer to peer communication is possible (via PCIe rather than NVlink) with this hardware.  This will get rather messy in setting up.  Some quick off the cuff performance data reveals some impact. Generally in our environment the gains are not worth the effort.  Using Amber and ''pmemd.cuda.MPI''
 +
 +<code>
 +                                                                              cpu:gpu
 +mdout.325288: Master Total CPU time:          982.60 seconds     0.27 hours   1:1
 +mdout.325289: Master Total CPU time:          611.08 seconds     0.17 hours   4:2
 +mdout.326208: Master Total CPU time:          537.97 seconds     0.15 hours  36:4
 +
 +</code> 
 ==== Miscellaneous ==== ==== Miscellaneous ====
  
Line 68: Line 81:
 #/usr/bin/nvidia-smi --gom=0 #/usr/bin/nvidia-smi --gom=0
  
-# for amber16 -pm=ENABLED -c=EXCLUSIVE_PROCESS+# for amber16 -pm=1/ENABLED -c=1/EXCLUSIVE_PROCESS
 #nvidia-smi --persistence-mode=1 #nvidia-smi --persistence-mode=1
 #nvidia-smi --compute-mode=1 #nvidia-smi --compute-mode=1
  
-# for mwgpu/exx96 -pm=ENABLED -c=DEFAULT +# for mwgpu/exx96 -pm=1/ENABLED -c=0/DEFAULT 
-nvidia-smi --persistence-mode=1 +# note: turned this off, running with defaults 
-nvidia-smi --compute-mode=0+# seems stable, maybe persistence later on 
 +# lets see how docker interacts first... 
 +#nvidia-smi --persistence-mode=1 
 +#nvidia-smi --compute-mode=0
  
 # turn ECC off (memory scrubbing) # turn ECC off (memory scrubbing)
Line 141: Line 157:
 # add packages and update # add packages and update
 yum install epel-release -y yum install epel-release -y
 +yum install flex flex-devel bison bison-devel -y 
 yum install tcl tcl-devel dmtcp -y yum install tcl tcl-devel dmtcp -y
 +yum install net-snmp net-snmp-libs net-agent-libs net-tools net-snmp-utils -y
 yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y yum install freeglut-devel libXi-devel libXmu-devel \ make mesa-libGLU-devel -y
 yum install blas blas-devel lapack lapack-devel boost boost-devel -y yum install blas blas-devel lapack lapack-devel boost boost-devel -y
Line 220: Line 238:
 nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB nvcr.io/nvidia/rapidsai/rapidsai   0.9-cuda10.0-runtime-centos7   22b5dc2f7e84        5 months ago        5.84GB
  
-free -g+free -m
               total        used        free      shared  buff/cache   available               total        used        free      shared  buff/cache   available
-Mem:             92                    88           0           1          89+Mem:          95056        1919       85338          20        7798       92571 
 +Swap:         10239           0       10239 
  
 # nvidia-smi # nvidia-smi
cluster/192.txt · Last modified: 2022/03/08 18:29 by hmeij07