Differences

This shows you the differences between two versions of the page.

--- cluster:107 [2012/12/21 19:01]
hmeij [Yale Qs]
+++ cluster:107 [2013/01/04 15:41]
hmeij [Yale Qs]
@@ Line 308: / Line 308: @@
   * Was either, or neither, single or double precision peak performance more/less important?
   * What was the software suite in mind (commercial, open source, or custom code GPU "enabled")?
+  * How did you reach out/educate users on the aspects of GPU computing?
   * What was the impact on the users? (recoding, recompiling)
   * Was the expected computational speed up realized?
-  * Was the PGI Accelerator leveraged? If so what were the results?
+  * Was the PGI Accelerator compilers leveraged? If so what were the results?
   * Do users compile with nvcc?
-  * How did you reach out/educate users on the aspects of GPU computing?
   * Does the scheduler have a resource for idle GPUs so they can be reserved?
   * How are the GPUs exposed/assigned to jobs the scheduler submits?
-  * Do you allow multiple serial jobs to access the same GPU?
+  * Do you allow multiple serial jobs to access the same GPU? Or one parallel job multiples GPUs?
-  * Do you allow parallel jobs access to more than one GPU per node?  Across nodes?
+  * Can parallel jobs access mutliple GPUs across nodes?
   * Any experiences with pmemd.cuda.MPI (part of Amber)?
   * What MPI flavor is used most in regards to GPU computing?
-  * Do you leverage the CPU HPC of the GPU HPC? For example, if there are 16 GPUs and 64 CPU cores on a 4-node cluster, do you allow 48 standard jobs on the idle cores? (assuming the max of 16 serial GPU jobs)
+  * Do you leverage the CPU HPC of the GPU HPC? For example, if there are 16 GPUs and 64 CPU cores on a cluster, do you allow 48 standard jobs on the idle cores? (assuming the max of 16 serial GPU jobs)
+Notes 04/01/2012 ConfCall
+  * Applications drive the CPU-to-GPU ratio and most will be 1-to-1, certainly not larger then 1-to-3
+  * Users did not share GPUs but could obtain more than one, always on same node
+  * Experimental setup with 36 gb/node, dual 8 core chips
+  * Nothing larger than that memory wise as CPU and GPU HPC work environments were not mixed
+  * No raw code development
+  * Speed ups was hard to tell
+  * PGI Accelerator was used because it is needed with any Fortran code (Note!)
+  * Double precision was most important in scientific applications
+  * MPI flavor was OpenMPI, and others (including MVApich) showed no advantages
+  * Book:  Programming Massively Parallel Processors, Second Edition:
+    * A Hands-on Approach by David B. Kirk and Wen-mei W. Hwu (Dec 28, 2012)
+    * Has examples of how to expose GPUs across nodes
 ==== ConfCall & Quote: AC ====
@@ Line 342: / Line 357: @@
 ^  Topic^Description  ^
-|  General| 2 CPUs (16 cores), 3 GPUs ( 22,500 cuda cores), 32 gb ram/node|
+|  General| 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node|
 |  Head Node| None|
 |  Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series|
@@ Line 611: / Line 626: @@
     * maybe all Lapack libraries too
   * Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)
-  * Remove the 6x2TB disk space and add an entry level Infiniband/Lustre solution
+  * Leave the 6x2TB disk space (for backup)
-    * 2U, 8 drives (SAS/SATA/SSD) up to 32 TB - get 10K drives?
+    * 2U, 8 drives up to 6x4=24 TB, possible?
+  * Add an entry level Infiniband/Lustre solution
+    * for parallel file locking
   * Spare parts
     * 8 port switch, HCAs and cables, drives ...

DokuWiki

User Tools

Site Tools

Differences

Page Tools