This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:116 [2013/07/11 15:22] hmeij [GPU-HPC] |
cluster:116 [2013/08/12 17:45] hmeij |
||
---|---|---|---|
Line 82: | Line 82: | ||
==== CPU-HPC ==== | ==== CPU-HPC ==== | ||
- | With hyperthreading on the 5 nodes, it provides for 160 cores. | + | With hyperthreading on the 5 nodes, it provides for 160 cores. |
So since there is no scheduler, you need to setup your environment and execute your program. | So since there is no scheduler, you need to setup your environment and execute your program. | ||
Line 161: | Line 161: | ||
LAMMPS and Amber were compiled against mvapich2. They should be run with " | LAMMPS and Amber were compiled against mvapich2. They should be run with " | ||
- | [[cluster: | + | [[cluster: |
+ | |||
+ | Sharptail example. | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@sharptail sharptail]$ cat hostfile | ||
+ | n34 | ||
+ | |||
+ | [hmeij@sharptail sharptail]$ mpirun_rsh -ssh -hostfile ~/ | ||
+ | -np 12 lmp_nVidia -sf gpu -c off -v g 2 -v x 32 -v y 32 -v z 64 -v t 100 < \ | ||
+ | ~/ | ||
+ | |||
+ | unloading gcc module | ||
+ | LAMMPS (31 May 2013) | ||
+ | Lattice spacing in x,y,z = 1.6796 1.6796 1.6796 | ||
+ | Created orthogonal box = (0 0 0) to (53.7471 53.7471 107.494) | ||
+ | 2 by 2 by 3 MPI processor grid | ||
+ | Created 262144 atoms | ||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | - Using GPGPU acceleration for lj/ | ||
+ | - with 6 proc(s) per device. | ||
+ | -------------------------------------------------------------------------- | ||
+ | GPU 0: Tesla K20m, 2496 cores, 4.3/4.7 GB, 0.71 GHZ (Mixed Precision) | ||
+ | GPU 1: Tesla K20m, 2496 cores, 4.3/0.71 GHZ (Mixed Precision) | ||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | Initializing GPU and compiling on process 0...Done. | ||
+ | Initializing GPUs 0-1 on core 0...Done. | ||
+ | Initializing GPUs 0-1 on core 1...Done. | ||
+ | Initializing GPUs 0-1 on core 2...Done. | ||
+ | Initializing GPUs 0-1 on core 3...Done. | ||
+ | Initializing GPUs 0-1 on core 4...Done. | ||
+ | Initializing GPUs 0-1 on core 5...Done. | ||
+ | |||
+ | Setting up run ... | ||
+ | Memory usage per processor = 5.83686 Mbytes | ||
+ | Step Temp E_pair E_mol TotEng Press | ||
+ | | ||
+ | | ||
+ | Loop time of 0.431599 on 12 procs for 100 steps with 262144 atoms | ||
+ | |||
+ | Pair time (%) = 0.255762 (59.2592) | ||
+ | Neigh time (%) = 4.80811e-06 (0.00111402) | ||
+ | Comm time (%) = 0.122923 (28.481) | ||
+ | Outpt time (%) = 0.00109257 (0.253146) | ||
+ | Other time (%) = 0.051816 (12.0056) | ||
+ | |||
+ | Nlocal: | ||
+ | Histogram: 2 3 3 0 0 0 0 2 1 1 | ||
+ | Nghost: | ||
+ | Histogram: 2 2 0 0 0 0 0 0 3 5 | ||
+ | Neighs: | ||
+ | Histogram: 12 0 0 0 0 0 0 0 0 0 | ||
+ | |||
+ | Total # of neighbors = 0 | ||
+ | Ave neighs/atom = 0 | ||
+ | Neighbor list builds = 5 | ||
+ | Dangerous builds = 0 | ||
+ | |||
+ | |||
+ | --------------------------------------------------------------------- | ||
+ | GPU Time Info (average): | ||
+ | --------------------------------------------------------------------- | ||
+ | Neighbor (CPU): | ||
+ | GPU Overhead: | ||
+ | Average split: | ||
+ | Threads / atom: 4. | ||
+ | Max Mem / Proc: 31.11 MB. | ||
+ | CPU Driver_Time: | ||
+ | CPU Idle_Time: | ||
+ | --------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | [[cluster: | ||
+ | |||
+ | Note: ran out of time to get an example running but it should follow the LAMMPS approach of above pretty closely. | ||
+ | |||
+ | Here is quick Amber example | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@sharptail nucleosome]$ export AMBER_HOME=/ | ||
+ | |||
+ | # find a GPU ID with gpu-info then expose that GPU to pmemd | ||
+ | [hmeij@sharptail nucleosome]$ export CUDA_VISIBLE_DEVICES=1 | ||
+ | |||
+ | # you only need one cpu core | ||
+ | [hmeij@sharptail nucleosome]$ mpirun_rsh -ssh -hostfile ~/ | ||
+ | / | ||
+ | |||
+ | </ | ||
- | [[cluster: | ||
NAMD was compiled with the built-in multi-node networking capabilities, | NAMD was compiled with the built-in multi-node networking capabilities, |