User Tools

Site Tools


cluster:167

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:167 [2018/06/26 16:58]
hmeij07 [CPU vs GPU]
cluster:167 [2018/06/28 12:27]
hmeij07 [CPU vs GPU]
Line 6: Line 6:
 So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions
  
-  * Data is period Jan 1 to June 25, 2018 +  * Data is period June 1 to June 25, 2018 (job information data ages out) 
-  * That period covers4,248 hours of time+    * Maybe build monthly script if this turns out to be usable info 
 +  * That period covers 600 hours of time
   * Assume 99% utilization of cpu core or gpu device   * Assume 99% utilization of cpu core or gpu device
-  * Available time is measured per cpu core but by gpu device +  * Available time is measured per physical cpu core but by gpu device 
-  * There is good/bad metric+  * There is no good/bad metric
   * Never collated such data before   * Never collated such data before
   * The GPU usage is based on detecting gpu reservations (gpu= flag)   * The GPU usage is based on detecting gpu reservations (gpu= flag)
-  *  
  
-^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes ^ + 
-| Device Count | 72 | 3:1 | 24 | cpu all intel, gpu all nvidia | +^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes June 2018 
-| Core Count | 1,712 | 1:38 | 64,300 | physical + 50% logical  |  +| Device Count | 72   3:1   24 | cpu all intel, gpu all nvidia | 
-| Avail Hours | 7,272,576 71:1 | 101,952 | total cpu cores, total gpus | +| Core Count | 1,192  1:54   64,300 | physical only  |  
-| Job Count 19,043 | 4:1 | 4,765 | scheduled jobs irregardless of exit status |+| Memory | 7,408  |  51:1  |  144 | GB | 
 +| Teraflops | 38  |  1.5:1  |  25 | double precision, floating point, theoretical | 
 +| Job Count | 2,834  |  3:1  |  1,045 | scheduled jobs irregardless of exit status 
 +| Avail Hours | 715,200  |  50:1  |  14,400 total cpu cores, total gpus | 
 +| Job Hours | 221,136  |  77:1   2,872 | cumulative hours of consumed usage | 
 +| Job Hours % | 31  |  6:1  |  5 | as a percentage | 
 +| Avail Hours2 | 561,600  |  39:1  |  14,400 | total cpu cores - hp12's 256 cores, total gpus | 
 +| Job Hours % 39  |  8:1  |  5 | more realistic...hp12 rarely used in June18| 
 + 
 +The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values into a stringsince 01Jan2017, the string '0000' has occurred 10 million times out of 16 million observations. (GPUs are polled every 10 mins). Sad. The surprising strong GPU job count is due to the Amber group launching lots of small GPU jobs. 
 + 
 +So were these 25 days in June 2018 an oddity? March is Honors' Theses time so lets look at Jul17 so we can compare that to Jul18 in august. 
 + 
 +^  Total Monthly CPU+GPU Hours  ^^^^^^^^^^^ 
 +^Ju17^Aug17^Sep17^Oct17^Nov17^Dec17^Jan18^Feb18^Mar18^Apr18^May18^ 
 +|313,303|273,051|128,390|111,224|280,101|51,727|306,453|222,585|437,959|262,227|294,724| 
 + 
 +^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes July 2017 ^ 
 +| Device Count | 72  |  4:1   20 | cpu all intelgpu all nvidia | 
 +| Core Count | 1,192 |  1:42  |  50,000 | physical only  |  
 +| Memory | 7,408  |  74:1  |  100 | GB | 
 +| Teraflops | 38  |  1.7:1  |  23 | double precision, floating point, theoretical | 
 +| Job Count | 12,798  |  18:1  |  722 | scheduled jobs irregardless of exit status | 
 +| Avail Hours | 886,848  |  60:1  |  14,880 | total cpu cores, total gpus | 
 +| Job Hours |  260,997  |  69:1  |  3,805 | cumulative hours of consumed usage | 
 +| Job Hours % | 30  |  1:1  |  26 | as a percentage | 
 +| Avail Hours2 | 696,384  |  47:1  |  14,880 | total cpu cores - hp12's 256 cores, total gpus | 
 +| Job Hours % | 37  |  1.5:1  |  26 | more realistic...hp12 rarely used in June18| 
 + 
 +  * Some noise in this data with the inability to match start and end of job (~15% of records) 
 +  * The assumption that ''hp12'' was barely used might not be correct 
 + 
 +Based on Jul17 we process about 60-70 times more CPU jobs than GPU jobs, that seems consistent with Jul18. The metric of Job Hours consumed versus Available Hours in %, the picture is probably more like Jul17...30-40% of CPU cycles are consumed and 25% of GPU cycles. We shall wait for Jul18 metrics. 
 + 
  
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/167.txt · Last modified: 2018/08/01 14:11 by hmeij07