User Tools

Site Tools


cluster:107

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:107 [2012/12/19 19:18]
hmeij [ConfCall & Quote: MW]
cluster:107 [2013/01/16 15:20]
hmeij [ConfCall & Quote: MW]
Line 299: Line 299:
  
   * buy a single rack and test locally, start small (will future racks be compatible?)   * buy a single rack and test locally, start small (will future racks be compatible?)
 +
 +==== Yale Qs ====
 +
 +Tasked with getting GPU HPC going at Wesleyan and trying to gain insights into the project. If you acquired a GPU HPC ...
 +
 +  * What was the most important design element of the cluster?
 +  * What factor(s) settled the CPU to GPU ratio?
 +  * Was either, or neither, single or double precision peak performance more/less important?
 +  * What was the software suite in mind (commercial, open source, or custom code GPU "enabled")?
 +  * How did you reach out/educate users on the aspects of GPU computing?
 +  * What was the impact on the users? (recoding, recompiling)
 +  * Was the expected computational speed up realized?
 +  * Was the PGI Accelerator compilers leveraged? If so what were the results?
 +  * Do users compile with nvcc?
 +  * Does the scheduler have a resource for idle GPUs so they can be reserved?
 +  * How are the GPUs exposed/assigned to jobs the scheduler submits?
 +  * Do you allow multiple serial jobs to access the same GPU? Or one parallel job multiples GPUs?
 +  * Can parallel jobs access mutliple GPUs across nodes?
 +  * Any experiences with pmemd.cuda.MPI (part of Amber)?
 +  * What MPI flavor is used most in regards to GPU computing?
 +  * Do you leverage the CPU HPC of the GPU HPC? For example, if there are 16 GPUs and 64 CPU cores on a cluster, do you allow 48 standard jobs on the idle cores? (assuming the max of 16 serial GPU jobs)
 +
 +Notes 04/01/2012 ConfCall
 +
 +  * Applications drive the CPU-to-GPU ratio and most will be 1-to-1, certainly not larger then 1-to-3
 +  * Users did not share GPUs but could obtain more than one, always on same node
 +  * Experimental setup with 36 gb/node, dual 8 core chips
 +  * Nothing larger than that memory wise as CPU and GPU HPC work environments were not mixed
 +  * No raw code development
 +  * Speed ups was hard to tell
 +  * PGI Accelerator was used because it is needed with any Fortran code (Note!)
 +  * Double precision was most important in scientific applications
 +  * MPI flavor was OpenMPI, and others (including MVApich) showed no advantages
 +  * Book:  Programming Massively Parallel Processors, Second Edition: 
 +    * A Hands-on Approach by David B. Kirk and Wen-mei W. Hwu (Dec 28, 2012) 
 +    * Has examples of how to expose GPUs across nodes
  
 ==== ConfCall & Quote: AC ==== ==== ConfCall & Quote: AC ====
Line 321: Line 357:
  
 ^  Topic^Description  ^ ^  Topic^Description  ^
-|  General| 2 CPUs (16 cores), 3 GPUs ( 22,500 cuda cores), 32 gb ram/node|+|  General| 2 CPUs (16 cores), 3 GPUs ( 7,500 cuda cores), 32 gb ram/node|
 |  Head Node| None| |  Head Node| None|
 |  Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series| |  Nodes|1x4U Rackmountable Chassis, 2xXeon E5-2660 2.20 Ghz 20MB Cache 8 cores (16cores/node), Romley series|
Line 590: Line 626:
     * maybe all Lapack libraries too     * maybe all Lapack libraries too
   * Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)   * Make the head node a compute node (in/for the future and beef it up too, 256 GB ram?)
-  * Remove the 6x2TB disk space and add an entry level Infiniband/Lustre solution +  * Leave the 6x2TB disk space (for backup)  
-    * 2U, 8 drives (SAS/SATA/SSD) up to 32 TB - get 10K drives?+    * 2U, 8 drives up to 6x4=24 TB, possible? 
 +  * Add an entry level Infiniband/Lustre solution 
 +    * for parallel file locking 
   * Spare parts   * Spare parts
-   * 8 port switch, HCAs and cables, drives ... +    * 8 port switch, HCAs and cables, drives ... 
-   * or get 5 years total warranty+    * or get 5 years total warranty
  
 +  * Testing notes
 +    * Amber, LAMMPS, NAMD
 +    * cuda v4&5
 +    * install/config dirs
 +    * use gnu ... with openmpi 
 +    * make deviceQuery
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/107.txt · Last modified: 2013/09/11 13:18 by hmeij