cluster:116
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:116 [2013/07/11 14:41] – [GPU-HPC] hmeij | cluster:116 [2014/02/04 18:57] (current) – hmeij | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| \\ | \\ | ||
| **[[cluster: | **[[cluster: | ||
| + | |||
| + | Since deployment of sharptail the information below is out of date. /home is now the same across the entire HPCC and served out by sharptail. | ||
| + | |||
| + | --- // | ||
| ===== Sharptail Cluster ===== | ===== Sharptail Cluster ===== | ||
| Line 39: | Line 43: | ||
| ==== /sanscratch ==== | ==== /sanscratch ==== | ||
| - | Sharptail will provide the users (and scheduler) with another 5 TB scratch file system. | + | Sharptail will provide the users (and scheduler) with another 5 TB scratch file system. |
| * Please offload as much IO from /home by staging your jobs in /sanscratch | * Please offload as much IO from /home by staging your jobs in /sanscratch | ||
| Line 78: | Line 82: | ||
| In both cases you do not need to target any specific core, the operating system will handle that part of the scheduling. | In both cases you do not need to target any specific core, the operating system will handle that part of the scheduling. | ||
| + | |||
| + | ==== NOTE ==== | ||
| + | |||
| + | |||
| + | ---- | ||
| + | |||
| + | Instructions below are obsolete, resources are now available via the scheduler. | ||
| + | |||
| + | Please read [[cluster: | ||
| + | |||
| + | --- // | ||
| + | |||
| + | ---- | ||
| ==== CPU-HPC ==== | ==== CPU-HPC ==== | ||
| - | With hyperthreading on the 5 nodes, it provides for 160 cores. | + | With hyperthreading on the 5 nodes, it provides for 160 cores. |
| So since there is no scheduler, you need to setup your environment and execute your program. | So since there is no scheduler, you need to setup your environment and execute your program. | ||
| Line 159: | Line 176: | ||
| Testing of GPUs at vendor sites may help get the idea of how to run GPU compiled code. | Testing of GPUs at vendor sites may help get the idea of how to run GPU compiled code. | ||
| + | LAMMPS and Amber were compiled against mvapich2. They should be run with " | ||
| + | |||
| + | [[cluster: | ||
| + | |||
| + | Sharptail example. | ||
| + | |||
| + | < | ||
| + | |||
| + | [hmeij@sharptail sharptail]$ cat hostfile | ||
| + | n34 | ||
| + | |||
| + | [hmeij@sharptail sharptail]$ mpirun_rsh -ssh -hostfile ~/ | ||
| + | -np 12 lmp_nVidia -sf gpu -c off -v g 2 -v x 32 -v y 32 -v z 64 -v t 100 < \ | ||
| + | ~/ | ||
| + | |||
| + | unloading gcc module | ||
| + | LAMMPS (31 May 2013) | ||
| + | Lattice spacing in x,y,z = 1.6796 1.6796 1.6796 | ||
| + | Created orthogonal box = (0 0 0) to (53.7471 53.7471 107.494) | ||
| + | 2 by 2 by 3 MPI processor grid | ||
| + | Created 262144 atoms | ||
| + | |||
| + | -------------------------------------------------------------------------- | ||
| + | - Using GPGPU acceleration for lj/ | ||
| + | - with 6 proc(s) per device. | ||
| + | -------------------------------------------------------------------------- | ||
| + | GPU 0: Tesla K20m, 2496 cores, 4.3/4.7 GB, 0.71 GHZ (Mixed Precision) | ||
| + | GPU 1: Tesla K20m, 2496 cores, 4.3/0.71 GHZ (Mixed Precision) | ||
| + | -------------------------------------------------------------------------- | ||
| + | |||
| + | Initializing GPU and compiling on process 0...Done. | ||
| + | Initializing GPUs 0-1 on core 0...Done. | ||
| + | Initializing GPUs 0-1 on core 1...Done. | ||
| + | Initializing GPUs 0-1 on core 2...Done. | ||
| + | Initializing GPUs 0-1 on core 3...Done. | ||
| + | Initializing GPUs 0-1 on core 4...Done. | ||
| + | Initializing GPUs 0-1 on core 5...Done. | ||
| + | |||
| + | Setting up run ... | ||
| + | Memory usage per processor = 5.83686 Mbytes | ||
| + | Step Temp E_pair E_mol TotEng Press | ||
| + | | ||
| + | | ||
| + | Loop time of 0.431599 on 12 procs for 100 steps with 262144 atoms | ||
| + | |||
| + | Pair time (%) = 0.255762 (59.2592) | ||
| + | Neigh time (%) = 4.80811e-06 (0.00111402) | ||
| + | Comm time (%) = 0.122923 (28.481) | ||
| + | Outpt time (%) = 0.00109257 (0.253146) | ||
| + | Other time (%) = 0.051816 (12.0056) | ||
| + | |||
| + | Nlocal: | ||
| + | Histogram: 2 3 3 0 0 0 0 2 1 1 | ||
| + | Nghost: | ||
| + | Histogram: 2 2 0 0 0 0 0 0 3 5 | ||
| + | Neighs: | ||
| + | Histogram: 12 0 0 0 0 0 0 0 0 0 | ||
| + | |||
| + | Total # of neighbors = 0 | ||
| + | Ave neighs/atom = 0 | ||
| + | Neighbor list builds = 5 | ||
| + | Dangerous builds = 0 | ||
| + | |||
| + | |||
| + | --------------------------------------------------------------------- | ||
| + | GPU Time Info (average): | ||
| + | --------------------------------------------------------------------- | ||
| + | Neighbor (CPU): | ||
| + | GPU Overhead: | ||
| + | Average split: | ||
| + | Threads / atom: 4. | ||
| + | Max Mem / Proc: 31.11 MB. | ||
| + | CPU Driver_Time: | ||
| + | CPU Idle_Time: | ||
| + | --------------------------------------------------------------------- | ||
| + | |||
| + | |||
| + | </ | ||
| + | |||
| + | |||
| + | [[cluster: | ||
| + | |||
| + | Note: ran out of time to get an example running but it should follow the LAMMPS approach of above pretty closely. | ||
| + | |||
| + | Here is quick Amber example | ||
| + | |||
| + | < | ||
| + | |||
| + | [hmeij@sharptail nucleosome]$ export AMBER_HOME=/ | ||
| + | |||
| + | # find a GPU ID with gpu-info then expose that GPU to pmemd | ||
| + | [hmeij@sharptail nucleosome]$ export CUDA_VISIBLE_DEVICES=1 | ||
| + | |||
| + | # you only need one cpu core | ||
| + | [hmeij@sharptail nucleosome]$ mpirun_rsh -ssh -hostfile ~/ | ||
| + | / | ||
| + | |||
| + | </ | ||
| - | [[cluster: | ||
| - | [[cluster: | ||
| + | NAMD was compiled with the built-in multi-node networking capabilities, | ||
| - | Here is an example | + | An example |
| < | < | ||
cluster/116.1373553698.txt.gz · Last modified: by hmeij
