This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:107 [2012/12/21 19:01] hmeij [Yale Qs] |
cluster:107 [2013/01/04 15:41] hmeij [Yale Qs] |
||
---|---|---|---|
Line 308: | Line 308: | ||
* Was either, or neither, single or double precision peak performance more/less important? | * Was either, or neither, single or double precision peak performance more/less important? | ||
* What was the software suite in mind (commercial, | * What was the software suite in mind (commercial, | ||
+ | * How did you reach out/educate users on the aspects of GPU computing? | ||
* What was the impact on the users? (recoding, recompiling) | * What was the impact on the users? (recoding, recompiling) | ||
* Was the expected computational speed up realized? | * Was the expected computational speed up realized? | ||
- | * Was the PGI Accelerator leveraged? If so what were the results? | + | * Was the PGI Accelerator |
* Do users compile with nvcc? | * Do users compile with nvcc? | ||
- | * How did you reach out/educate users on the aspects of GPU computing? | ||
* Does the scheduler have a resource for idle GPUs so they can be reserved? | * Does the scheduler have a resource for idle GPUs so they can be reserved? | ||
* How are the GPUs exposed/ | * How are the GPUs exposed/ | ||
- | * Do you allow multiple serial jobs to access the same GPU? | + | * Do you allow multiple serial jobs to access the same GPU? Or one parallel job multiples GPUs? |
- | * Do you allow parallel jobs access | + | * Can parallel jobs access |
* Any experiences with pmemd.cuda.MPI (part of Amber)? | * Any experiences with pmemd.cuda.MPI (part of Amber)? | ||
* What MPI flavor is used most in regards to GPU computing? | * What MPI flavor is used most in regards to GPU computing? | ||
- | * Do you leverage the CPU HPC of the GPU HPC? For example, if there are 16 GPUs and 64 CPU cores on a 4-node | + | * Do you leverage the CPU HPC of the GPU HPC? For example, if there are 16 GPUs and 64 CPU cores on a cluster, do you allow 48 standard jobs on the idle cores? (assuming the max of 16 serial GPU jobs) |
+ | |||
+ | Notes 04/01/2012 ConfCall | ||
+ | |||
+ | * Applications drive the CPU-to-GPU ratio and most will be 1-to-1, certainly not larger then 1-to-3 | ||
+ | * Users did not share GPUs but could obtain more than one, always on same node | ||
+ | * Experimental setup with 36 gb/node, dual 8 core chips | ||
+ | * Nothing larger than that memory wise as CPU and GPU HPC work environments were not mixed | ||
+ | * No raw code development | ||
+ | * Speed ups was hard to tell | ||
+ | * PGI Accelerator was used because it is needed with any Fortran code (Note!) | ||
+ | * Double precision was most important in scientific applications | ||
+ | * MPI flavor was OpenMPI, and others (including MVApich) showed no advantages | ||
+ | * Book: Programming Massively Parallel Processors, Second Edition: | ||
+ | * A Hands-on Approach by David B. Kirk and Wen-mei W. Hwu (Dec 28, 2012) | ||
+ | * Has examples of how to expose GPUs across nodes | ||
==== ConfCall & Quote: AC ==== | ==== ConfCall & Quote: AC ==== | ||
Line 342: | Line 357: | ||
^ Topic^Description | ^ Topic^Description | ||
- | | General| 2 CPUs (16 cores), 3 GPUs ( 22,500 cuda cores), 32 gb ram/node| | + | | General| 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node| |
| Head Node| None| | | Head Node| None| | ||
| Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/ | | Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/ | ||
Line 611: | Line 626: | ||
* maybe all Lapack libraries too | * maybe all Lapack libraries too | ||
* Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?) | * Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?) | ||
- | * Remove | + | * Leave the 6x2TB disk space (for backup) |
- | * 2U, 8 drives | + | * 2U, 8 drives up to 6x4=24 |
+ | * Add an entry level Infiniband/ | ||
+ | * for parallel file locking | ||
* Spare parts | * Spare parts | ||
* 8 port switch, HCAs and cables, drives ... | * 8 port switch, HCAs and cables, drives ... |