Differences

This shows you the differences between two versions of the page.

--- cluster:107 [2012/12/19 19:18]
hmeij [ConfCall & Quote: MW]
+++ cluster:107 [2013/01/16 15:20]
hmeij [ConfCall & Quote: MW]
@@ Line 299: / Line 299: @@
   * buy a single rack and test locally, start small (will future racks be compatible?)
+==== Yale Qs ====
+Tasked with getting GPU HPC going at Wesleyan and trying to gain insights into the project. If you acquired a GPU HPC ...
+  * What was the most important design element of the cluster?
+  * What factor(s) settled the CPU to GPU ratio?
+  * Was either, or neither, single or double precision peak performance more/less important?
+  * What was the software suite in mind (commercial, open source, or custom code GPU "enabled")?
+  * How did you reach out/educate users on the aspects of GPU computing?
+  * What was the impact on the users? (recoding, recompiling)
+  * Was the expected computational speed up realized?
+  * Was the PGI Accelerator compilers leveraged? If so what were the results?
+  * Do users compile with nvcc?
+  * Does the scheduler have a resource for idle GPUs so they can be reserved?
+  * How are the GPUs exposed/assigned to jobs the scheduler submits?
+  * Do you allow multiple serial jobs to access the same GPU? Or one parallel job multiples GPUs?
+  * Can parallel jobs access mutliple GPUs across nodes?
+  * Any experiences with pmemd.cuda.MPI (part of Amber)?
+  * What MPI flavor is used most in regards to GPU computing?
+  * Do you leverage the CPU HPC of the GPU HPC? For example, if there are 16 GPUs and 64 CPU cores on a cluster, do you allow 48 standard jobs on the idle cores? (assuming the max of 16 serial GPU jobs)
+Notes 04/01/2012 ConfCall
+  * Applications drive the CPU-to-GPU ratio and most will be 1-to-1, certainly not larger then 1-to-3
+  * Users did not share GPUs but could obtain more than one, always on same node
+  * Experimental setup with 36 gb/node, dual 8 core chips
+  * Nothing larger than that memory wise as CPU and GPU HPC work environments were not mixed
+  * No raw code development
+  * Speed ups was hard to tell
+  * PGI Accelerator was used because it is needed with any Fortran code (Note!)
+  * Double precision was most important in scientific applications
+  * MPI flavor was OpenMPI, and others (including MVApich) showed no advantages
+  * Book:  Programming Massively Parallel Processors, Second Edition:
+    * A Hands-on Approach by David B. Kirk and Wen-mei W. Hwu (Dec 28, 2012)
+    * Has examples of how to expose GPUs across nodes
 ==== ConfCall & Quote: AC ====
@@ Line 321: / Line 357: @@
 ^  Topic^Description  ^
-|  General| 2 CPUs (16 cores), 3 GPUs ( 22,500 cuda cores), 32 gb ram/node|
+|  General| 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node|
 |  Head Node| None|
 |  Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series|
@@ Line 590: / Line 626: @@
     * maybe all Lapack libraries too
   * Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)
-  * Remove the 6x2TB disk space and add an entry level Infiniband/Lustre solution
+  * Leave the 6x2TB disk space (for backup)
-    * 2U, 8 drives (SAS/SATA/SSD) up to 32 TB - get 10K drives?
+    * 2U, 8 drives up to 6x4=24 TB, possible?
+  * Add an entry level Infiniband/Lustre solution
+    * for parallel file locking
   * Spare parts
-   * 8 port switch, HCAs and cables, drives ...
+    * 8 port switch, HCAs and cables, drives ...
-   * or get 5 years total warranty
+    * or get 5 years total warranty
+  * Testing notes
+    * Amber, LAMMPS, NAMD
+    * cuda v4&5
+    * install/config dirs
+    * use gnu ... with openmpi
+    * make deviceQuery
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools