This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:116 [2013/07/10 15:46] hmeij [GPU code] |
cluster:116 [2014/02/04 18:57] (current) hmeij |
||
---|---|---|---|
Line 1: | Line 1: | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||
+ | |||
+ | Since deployment of sharptail the information below is out of date. /home is now the same across the entire HPCC and served out by sharptail. | ||
+ | |||
+ | --- // | ||
===== Sharptail Cluster ===== | ===== Sharptail Cluster ===== | ||
- | The new hardware has been delivered and rack& | + | A recycle head node name, seems appropriate. |
+ | |||
+ | The new hardware has been delivered and rack& | ||
[[cluster: | [[cluster: | ||
Line 11: | Line 17: | ||
==== Recess Period ==== | ==== Recess Period ==== | ||
- | July and August 2013 I'll call the " | + | July and August 2013 I'll call the " |
* ssh sharptail.wesleyan.edu | * ssh sharptail.wesleyan.edu | ||
- | * then ssh to one of the nodes | + | * then ssh to one of the nodes, see samples below |
* setup your environment like in a submit script, then run your program | * setup your environment like in a submit script, then run your program | ||
* Reboots may happen. I'll try to warn folks when. | * Reboots may happen. I'll try to warn folks when. | ||
* Shell access will disappear in final production mode! (use greentail or swallowtail) | * Shell access will disappear in final production mode! (use greentail or swallowtail) | ||
+ | * /home is still being populated, should finish some time Thursday Jul 11th at night | ||
==== /home ==== | ==== /home ==== | ||
- | Sharptail is slated to become our file server for /home taking over from greentail. | + | Sharptail is slated to become our file server for /home taking over from greentail. |
* Files that are created on greentail are pushed to sharptail | * Files that are created on greentail are pushed to sharptail | ||
* Files that disappeared on greentail also disappear on sharptail | * Files that disappeared on greentail also disappear on sharptail | ||
- | * Files that were created on sharptail (and do not exist on greentail) disappear | + | * Files that were created on sharptail (and do not exist on greentail) |
- | So it's important that if you want to keep stuff on sharptail you need to copy that to greentail before a refresh happens. I suggest you create a ~/sharptail directory and inside of that on sharptail. You can transfer files like so: | + | So it's important that if you want to keep stuff on sharptail you need to copy that to greentail before a refresh happens. I suggest you create a ~/sharptail directory and work inside of that on sharptail. You can transfer files like so: |
* cp -rp / | * cp -rp / | ||
* scp -rp ~/ | * scp -rp ~/ | ||
- | So in short, in the future, sharptail:/ | + | So in short, in the future, sharptail:/ |
==== /sanscratch ==== | ==== /sanscratch ==== | ||
- | Sharptail will provide the users (and scheduler) with another 5 TB scratch file system. | + | Sharptail will provide the users (and scheduler) with another 5 TB scratch file system. |
* Please offload as much IO from /home by staging your jobs in /sanscratch | * Please offload as much IO from /home by staging your jobs in /sanscratch | ||
* an example: [[cluster: | * an example: [[cluster: | ||
- | ==== GPU code ==== | + | ==== Clashes ==== |
+ | |||
+ | Without a scheduler jobs may clash, too many jobs may be running. | ||
+ | |||
+ | To find idle CPU cores, ssh to a node, start ' | ||
+ | |||
+ | < | ||
+ | top - 10:20:47 up 2 days, 55 min, 3 users, | ||
+ | Tasks: 766 total, | ||
+ | Cpu0 : 0.0%us, | ||
+ | Cpu1 : 0.0%us, | ||
+ | Cpu2 : 0.0%us, | ||
+ | Cpu3 : 0.0%us, | ||
+ | Cpu4 : 0.0%us, | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | To find idle GPU cores, ssh to a node, then type ' | ||
+ | |||
+ | <code> | ||
+ | |||
+ | [hmeij@n33 bin]$ gpu-info | ||
+ | ==================================================== | ||
+ | Device | ||
+ | ==================================================== | ||
+ | 0 Tesla K20m 25 C 0 % | ||
+ | 1 Tesla K20m 27 C 0 % | ||
+ | 2 Tesla K20m 25 C 0 % | ||
+ | 3 Tesla K20m 25 C 0 % | ||
+ | ==================================================== | ||
+ | |||
+ | </ | ||
+ | |||
+ | In both cases you do not need to target any specific core, the operating system will handle that part of the scheduling. | ||
+ | |||
+ | ==== NOTE ==== | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | Instructions below are obsolete, resources are now available via the scheduler. | ||
+ | |||
+ | Please read [[cluster: | ||
+ | |||
+ | --- // | ||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | ==== CPU-HPC ==== | ||
+ | |||
+ | With hyperthreading on the 5 nodes, it provides for 160 cores. | ||
+ | |||
+ | So since there is no scheduler, you need to setup your environment and execute your program. | ||
+ | |||
+ | First create the machinesfile, | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@sharptail cd]$ cat mpi_machines | ||
+ | n33 | ||
+ | n33 | ||
+ | n33 | ||
+ | n33 | ||
+ | n34 | ||
+ | n34 | ||
+ | n34 | ||
+ | n34 | ||
+ | |||
+ | [hmeij@sharptail cd]$ . / | ||
+ | [hmeij@sharptail cd]$ . / | ||
+ | |||
+ | [hmeij@sharptail cd]$ time / | ||
+ | -x LD_LIBRARY_PATH -machinefile ./ | ||
+ | / | ||
+ | -O -i inp/mini.in -p 1g6r.cd.parm -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 & | ||
+ | [1] 3304 | ||
+ | |||
+ | [hmeij@sharptail cd]$ ssh n33 top -b -n1 -u hmeij | ||
+ | top - 14:49:28 up 1 day, 5:24, 1 user, load average: 0.89, 0.20, 0.06 | ||
+ | Tasks: 769 total, | ||
+ | Cpu(s): | ||
+ | Mem: 264635888k total, | ||
+ | Swap: 31999992k total, | ||
+ | |||
+ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | ||
+ | 24348 hmeij | ||
+ | 24345 hmeij | ||
+ | 24346 hmeij | ||
+ | 24347 hmeij | ||
+ | 24353 hmeij | ||
+ | 24344 hmeij | ||
+ | 24352 hmeij | ||
+ | |||
+ | [hmeij@sharptail cd]$ ssh n34 top -b -n1 -u hmeij | ||
+ | top - 14:49:37 up 1 day, 2:40, 0 users, | ||
+ | Tasks: 766 total, | ||
+ | Cpu(s): | ||
+ | Mem: 264635888k total, | ||
+ | Swap: 31999992k total, | ||
+ | |||
+ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | ||
+ | 12198 hmeij | ||
+ | 12200 hmeij | ||
+ | 12201 hmeij | ||
+ | 12199 hmeij | ||
+ | 12205 hmeij | ||
+ | 12197 hmeij | ||
+ | 12204 hmeij | ||
+ | </ | ||
+ | |||
+ | ==== GPU-HPC | ||
LAMMPS, Amber and NAMD have been compiled using Nvidia' | LAMMPS, Amber and NAMD have been compiled using Nvidia' | ||
- | Module files have been created for these apps and are automatically | + | Module files have been created for these apps and are automatically |
< | < | ||
Line 56: | Line 174: | ||
</ | </ | ||
- | Testing of GPUs at vendor sites, may help get the idea of how to run code. | + | Testing of GPUs at vendor sites may help get the idea of how to run GPU compiled |
+ | LAMMPS and Amber were compiled against mvapich2. They should be run with " | ||
- | [[cluster: | + | [[cluster: |
- | [[cluster: | + | Sharptail example. |
+ | < | ||
+ | |||
+ | [hmeij@sharptail sharptail]$ cat hostfile | ||
+ | n34 | ||
+ | |||
+ | [hmeij@sharptail sharptail]$ mpirun_rsh -ssh -hostfile ~/ | ||
+ | -np 12 lmp_nVidia -sf gpu -c off -v g 2 -v x 32 -v y 32 -v z 64 -v t 100 < \ | ||
+ | ~/ | ||
+ | |||
+ | unloading gcc module | ||
+ | LAMMPS (31 May 2013) | ||
+ | Lattice spacing in x,y,z = 1.6796 1.6796 1.6796 | ||
+ | Created orthogonal box = (0 0 0) to (53.7471 53.7471 107.494) | ||
+ | 2 by 2 by 3 MPI processor grid | ||
+ | Created 262144 atoms | ||
+ | -------------------------------------------------------------------------- | ||
+ | - Using GPGPU acceleration for lj/ | ||
+ | - with 6 proc(s) per device. | ||
+ | -------------------------------------------------------------------------- | ||
+ | GPU 0: Tesla K20m, 2496 cores, 4.3/4.7 GB, 0.71 GHZ (Mixed Precision) | ||
+ | GPU 1: Tesla K20m, 2496 cores, 4.3/0.71 GHZ (Mixed Precision) | ||
+ | -------------------------------------------------------------------------- | ||
+ | Initializing GPU and compiling on process 0...Done. | ||
+ | Initializing GPUs 0-1 on core 0...Done. | ||
+ | Initializing GPUs 0-1 on core 1...Done. | ||
+ | Initializing GPUs 0-1 on core 2...Done. | ||
+ | Initializing GPUs 0-1 on core 3...Done. | ||
+ | Initializing GPUs 0-1 on core 4...Done. | ||
+ | Initializing GPUs 0-1 on core 5...Done. | ||
+ | |||
+ | Setting up run ... | ||
+ | Memory usage per processor = 5.83686 Mbytes | ||
+ | Step Temp E_pair E_mol TotEng Press | ||
+ | | ||
+ | | ||
+ | Loop time of 0.431599 on 12 procs for 100 steps with 262144 atoms | ||
+ | |||
+ | Pair time (%) = 0.255762 (59.2592) | ||
+ | Neigh time (%) = 4.80811e-06 (0.00111402) | ||
+ | Comm time (%) = 0.122923 (28.481) | ||
+ | Outpt time (%) = 0.00109257 (0.253146) | ||
+ | Other time (%) = 0.051816 (12.0056) | ||
+ | |||
+ | Nlocal: | ||
+ | Histogram: 2 3 3 0 0 0 0 2 1 1 | ||
+ | Nghost: | ||
+ | Histogram: 2 2 0 0 0 0 0 0 3 5 | ||
+ | Neighs: | ||
+ | Histogram: 12 0 0 0 0 0 0 0 0 0 | ||
+ | |||
+ | Total # of neighbors = 0 | ||
+ | Ave neighs/atom = 0 | ||
+ | Neighbor list builds = 5 | ||
+ | Dangerous builds = 0 | ||
+ | |||
+ | |||
+ | --------------------------------------------------------------------- | ||
+ | GPU Time Info (average): | ||
+ | --------------------------------------------------------------------- | ||
+ | Neighbor (CPU): | ||
+ | GPU Overhead: | ||
+ | Average split: | ||
+ | Threads / atom: 4. | ||
+ | Max Mem / Proc: 31.11 MB. | ||
+ | CPU Driver_Time: | ||
+ | CPU Idle_Time: | ||
+ | --------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | [[cluster: | ||
+ | |||
+ | Note: ran out of time to get an example running but it should follow the LAMMPS approach of above pretty closely. | ||
+ | |||
+ | Here is quick Amber example | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@sharptail nucleosome]$ export AMBER_HOME=/ | ||
+ | |||
+ | # find a GPU ID with gpu-info then expose that GPU to pmemd | ||
+ | [hmeij@sharptail nucleosome]$ export CUDA_VISIBLE_DEVICES=1 | ||
+ | |||
+ | # you only need one cpu core | ||
+ | [hmeij@sharptail nucleosome]$ mpirun_rsh -ssh -hostfile ~/ | ||
+ | / | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | NAMD was compiled with the built-in multi-node networking capabilities, | ||
+ | |||
+ | An example of running NAMD is below. | ||
+ | |||
+ | < | ||
+ | |||
+ | [microway@n33 namd-test]$ which charmrun | ||
+ | / | ||
+ | |||
+ | [microway@n33 namd-test]$ echo $NAMD_DIR | ||
+ | / | ||
+ | |||
+ | [microway@n33 ~]$ cat namd-machines | ||
+ | group main | ||
+ | host n33 | ||
+ | host n34 | ||
+ | host n35 | ||
+ | host n36 | ||
+ | host n37 | ||
+ | |||
+ | [microway@n33 namd-test]$ charmrun $NAMD_DIR/ | ||
+ | Charmrun> | ||
+ | | ||
+ | |||
+ | unloading gcc module | ||
+ | Charmrun> | ||
+ | Converse/ | ||
+ | Trace: traceroot: / | ||
+ | Charm++> scheduler running in netpoll mode. | ||
+ | CharmLB> Load balancer assumes all CPUs are same. | ||
+ | Charm++> Running on 5 unique compute nodes (32-way SMP). | ||
+ | Charm++> cpu topology info is gathered in 0.003 seconds. | ||
+ | Info: Running on 8 processors, 8 nodes, 5 physical nodes. | ||
+ | Info: CPU topology information available. | ||
+ | Info: Charm++/ | ||
+ | Pe 3 physical rank 0 binding to CUDA device 3 on n36: 'Tesla K20m' | ||
+ | Pe 4 physical rank 0 binding to CUDA device 0 on n37: 'Tesla K20m' | ||
+ | Pe 5 physical rank 1 binding to CUDA device 2 on n33: 'Tesla K20m' | ||
+ | Pe 0 physical rank 0 binding to CUDA device 0 on n33: 'Tesla K20m' | ||
+ | Info: 289.738 MB of memory in use based on / | ||
+ | ...etc | ||
+ | |||
+ | [microway@n33 ~]$ gpu-info | ||
+ | ==================================================== | ||
+ | Device | ||
+ | ==================================================== | ||
+ | 0 Tesla K20m 29 C 50 % | ||
+ | 1 Tesla K20m 27 C 0 % | ||
+ | 2 Tesla K20m 28 C 51 % | ||
+ | 3 Tesla K20m 25 C 0 % | ||
+ | ==================================================== | ||
+ | |||
+ | |||
+ | </ | ||
+ | Hint: Look in / | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |