This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:116 [2013/07/11 14:41] hmeij [GPU-HPC] |
cluster:116 [2014/02/04 18:57] (current) hmeij |
||
---|---|---|---|
Line 1: | Line 1: | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||
+ | |||
+ | Since deployment of sharptail the information below is out of date. /home is now the same across the entire HPCC and served out by sharptail. | ||
+ | |||
+ | --- // | ||
===== Sharptail Cluster ===== | ===== Sharptail Cluster ===== | ||
Line 39: | Line 43: | ||
==== /sanscratch ==== | ==== /sanscratch ==== | ||
- | Sharptail will provide the users (and scheduler) with another 5 TB scratch file system. | + | Sharptail will provide the users (and scheduler) with another 5 TB scratch file system. |
* Please offload as much IO from /home by staging your jobs in /sanscratch | * Please offload as much IO from /home by staging your jobs in /sanscratch | ||
Line 78: | Line 82: | ||
In both cases you do not need to target any specific core, the operating system will handle that part of the scheduling. | In both cases you do not need to target any specific core, the operating system will handle that part of the scheduling. | ||
+ | |||
+ | ==== NOTE ==== | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | Instructions below are obsolete, resources are now available via the scheduler. | ||
+ | |||
+ | Please read [[cluster: | ||
+ | |||
+ | --- // | ||
+ | |||
+ | ---- | ||
==== CPU-HPC ==== | ==== CPU-HPC ==== | ||
- | With hyperthreading on the 5 nodes, it provides for 160 cores. | + | With hyperthreading on the 5 nodes, it provides for 160 cores. |
So since there is no scheduler, you need to setup your environment and execute your program. | So since there is no scheduler, you need to setup your environment and execute your program. | ||
Line 159: | Line 176: | ||
Testing of GPUs at vendor sites may help get the idea of how to run GPU compiled code. | Testing of GPUs at vendor sites may help get the idea of how to run GPU compiled code. | ||
+ | LAMMPS and Amber were compiled against mvapich2. They should be run with " | ||
+ | |||
+ | [[cluster: | ||
+ | |||
+ | Sharptail example. | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@sharptail sharptail]$ cat hostfile | ||
+ | n34 | ||
+ | |||
+ | [hmeij@sharptail sharptail]$ mpirun_rsh -ssh -hostfile ~/ | ||
+ | -np 12 lmp_nVidia -sf gpu -c off -v g 2 -v x 32 -v y 32 -v z 64 -v t 100 < \ | ||
+ | ~/ | ||
+ | |||
+ | unloading gcc module | ||
+ | LAMMPS (31 May 2013) | ||
+ | Lattice spacing in x,y,z = 1.6796 1.6796 1.6796 | ||
+ | Created orthogonal box = (0 0 0) to (53.7471 53.7471 107.494) | ||
+ | 2 by 2 by 3 MPI processor grid | ||
+ | Created 262144 atoms | ||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | - Using GPGPU acceleration for lj/ | ||
+ | - with 6 proc(s) per device. | ||
+ | -------------------------------------------------------------------------- | ||
+ | GPU 0: Tesla K20m, 2496 cores, 4.3/4.7 GB, 0.71 GHZ (Mixed Precision) | ||
+ | GPU 1: Tesla K20m, 2496 cores, 4.3/0.71 GHZ (Mixed Precision) | ||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | Initializing GPU and compiling on process 0...Done. | ||
+ | Initializing GPUs 0-1 on core 0...Done. | ||
+ | Initializing GPUs 0-1 on core 1...Done. | ||
+ | Initializing GPUs 0-1 on core 2...Done. | ||
+ | Initializing GPUs 0-1 on core 3...Done. | ||
+ | Initializing GPUs 0-1 on core 4...Done. | ||
+ | Initializing GPUs 0-1 on core 5...Done. | ||
+ | |||
+ | Setting up run ... | ||
+ | Memory usage per processor = 5.83686 Mbytes | ||
+ | Step Temp E_pair E_mol TotEng Press | ||
+ | | ||
+ | | ||
+ | Loop time of 0.431599 on 12 procs for 100 steps with 262144 atoms | ||
+ | |||
+ | Pair time (%) = 0.255762 (59.2592) | ||
+ | Neigh time (%) = 4.80811e-06 (0.00111402) | ||
+ | Comm time (%) = 0.122923 (28.481) | ||
+ | Outpt time (%) = 0.00109257 (0.253146) | ||
+ | Other time (%) = 0.051816 (12.0056) | ||
+ | |||
+ | Nlocal: | ||
+ | Histogram: 2 3 3 0 0 0 0 2 1 1 | ||
+ | Nghost: | ||
+ | Histogram: 2 2 0 0 0 0 0 0 3 5 | ||
+ | Neighs: | ||
+ | Histogram: 12 0 0 0 0 0 0 0 0 0 | ||
+ | |||
+ | Total # of neighbors = 0 | ||
+ | Ave neighs/atom = 0 | ||
+ | Neighbor list builds = 5 | ||
+ | Dangerous builds = 0 | ||
+ | |||
+ | |||
+ | --------------------------------------------------------------------- | ||
+ | GPU Time Info (average): | ||
+ | --------------------------------------------------------------------- | ||
+ | Neighbor (CPU): | ||
+ | GPU Overhead: | ||
+ | Average split: | ||
+ | Threads / atom: 4. | ||
+ | Max Mem / Proc: 31.11 MB. | ||
+ | CPU Driver_Time: | ||
+ | CPU Idle_Time: | ||
+ | --------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | [[cluster: | ||
+ | |||
+ | Note: ran out of time to get an example running but it should follow the LAMMPS approach of above pretty closely. | ||
+ | |||
+ | Here is quick Amber example | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@sharptail nucleosome]$ export AMBER_HOME=/ | ||
+ | |||
+ | # find a GPU ID with gpu-info then expose that GPU to pmemd | ||
+ | [hmeij@sharptail nucleosome]$ export CUDA_VISIBLE_DEVICES=1 | ||
+ | |||
+ | # you only need one cpu core | ||
+ | [hmeij@sharptail nucleosome]$ mpirun_rsh -ssh -hostfile ~/ | ||
+ | / | ||
+ | |||
+ | </ | ||
- | [[cluster: | ||
- | [[cluster: | ||
+ | NAMD was compiled with the built-in multi-node networking capabilities, | ||
- | Here is an example | + | An example |
< | < |