User Tools

Site Tools


cluster:167

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:167 [2018/06/28 09:59]
hmeij07
cluster:167 [2018/08/01 10:11] (current)
hmeij07 [July 2018]
Line 22: Line 22:
 | Teraflops | 38  |  1.5:1  |  25 | double precision, floating point, theoretical | | Teraflops | 38  |  1.5:1  |  25 | double precision, floating point, theoretical |
 | Job Count | 2,834  |  3:1  |  1,045 | processed jobs irregardless of exit status | | Job Count | 2,834  |  3:1  |  1,045 | processed jobs irregardless of exit status |
-| Avail Hours | 715,200  |  50:1  |  14,400 | total cpu cores, total gpus |+| Avail Hours | 715,200  |  50:1  |  14,400 | total for cpu cores, total for gpus |
 | Job Hours | 221,136  |  77:1  |  2,872 | cumulative hours of consumed usage | | Job Hours | 221,136  |  77:1  |  2,872 | cumulative hours of consumed usage |
-| Job Hours % | 31  |  6:1  |  5 | as a percentage | +| Job Hours % | 31  |  6:1  |  5 | as a percentage of available 
-| Avail Hours2 | 561,600  |  39:1  |  14,400 | total cpu cores hp12's 256 cores, total gpus | +| Avail Hours2 | 561,600  |  39:1  |  14,400 | total cpu cores minus hp12's 256 cores, total gpus | 
-| Job Hours % | 39  |  8:1  |  5 | more realistic...hp12 rarely used in June18|+| Job Hours2 % | 39  |  8:1  |  5 | more realistic...hp12 rarely used |
  
-The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values into a string, since 01Jan2017, the string '0000' has occurred 10 million times out of 16 million observations. (GPUs are polled every 10 mins). Sad. The surprising strong GPU job count is due to the Amber group launching lots of small GPU jobs.+The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values into a string, since 01Jan2017, the string '0000' has occurred 10 million times out of 16 million observations. (GPUs are polled every 10 mins). The surprising strong GPU job count is due to the Amber group launching lots of small GPU jobs.
  
-So were these 25 days in June 2018 an oddity? March is Honors' Theses time so lets look at Jul17 so we can compare that to Jul18 in august.+So were these 25 days in June 2018 an oddity? 
  
 ^  Total Monthly CPU+GPU Hours  ^^^^^^^^^^^ ^  Total Monthly CPU+GPU Hours  ^^^^^^^^^^^
 ^Ju17^Aug17^Sep17^Oct17^Nov17^Dec17^Jan18^Feb18^Mar18^Apr18^May18^ ^Ju17^Aug17^Sep17^Oct17^Nov17^Dec17^Jan18^Feb18^Mar18^Apr18^May18^
 |313,303|273,051|128,390|111,224|280,101|51,727|306,453|222,585|437,959|262,227|294,724| |313,303|273,051|128,390|111,224|280,101|51,727|306,453|222,585|437,959|262,227|294,724|
 +
 +March is Honors' Theses time so lets look at Jul17 (no GTX gpus) so we can compare that to Jul18 in august. 31 days in July is 744 hours.
  
 ^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes July 2017 ^ ^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes July 2017 ^
Line 44: Line 46:
 | Avail Hours | 886,848  |  60:1  |  14,880 | total cpu cores, total gpus | | Avail Hours | 886,848  |  60:1  |  14,880 | total cpu cores, total gpus |
 | Job Hours |  260,997  |  69:1  |  3,805 | cumulative hours of consumed usage | | Job Hours |  260,997  |  69:1  |  3,805 | cumulative hours of consumed usage |
-| Job Hours % | 30  |  1:1  |  26 | as a percentage | +| Job Hours % | 30  |  1:1  |  26 | as a percentage of available 
-| Avail Hours2 | 696,384  |  47:1  |  14,880 | total cpu cores hp12's 256 cores, total gpus | +| Avail Hours2 | 696,384  |  47:1  |  14,880 | total for cpu cores minus hp12's 256 cores, total for gpus | 
-| Job Hours % | 37  |  1.5:1  |  26 | more realistic...hp12 rarely used in June18|+| Job Hours2 % | 37  |  1.5:1  |  26 | more realistic...hp12 rarely used |
  
   * Some noise in this data with the inability to match start and end of job (~15% of records)   * Some noise in this data with the inability to match start and end of job (~15% of records)
-  * The assumption that ''hp12'' was barely used might not be correct+  * The assumption that ''hp12'' was barely used in July 2017 might not be correct 
 + 
 +Based on Jul17 we process about 60-70 times more CPU Job Hours than GPU Job Hours, that seems consistent with Jun18. The metric of Job Hours consumed versus Available Hours in %, the picture is probably more like Jul17...30-40% of CPU cycles are consumed and 25% of GPU cycles.  
 + 
 +If we take total hours consumed from Usage Report (the 313,303 hours for Jul17) we consumed about 45% of available hours (without hp12 in the mix). 
 + 
 +We shall wait for Jul18 metrics. 
 + 
 +==== July 2018 ==== 
 + 
 +^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes July 2017 ^ 
 +| Device Count | 74  |  3:1  |  24 | cpu all intel, gpu all nvidia | 
 +| Core Count | 1,208 |  1:53  |  64,336 | physical only  |  
 +| Memory | 7,516  |  52:1  |  144 | GB | 
 +| Teraflops | 38  |  1.5:1  |  25 | double precision, floating point, theoretical | 
 +| Job Count | 12,798  |  18:1  |  722 | processed jobs irregardless of exit status | 
 +| Avail Hours | 898,752  |  50:1  |  17,856 | total cpu cores, total gpus | 
 +| Job Hours |  322,207  |  1732: |  186 | cumulative hours of consumed usage | 
 +| Job Hours % | 36  |  36:1  |  1 | as a percentage of available | 
  
-Based on Jul17 we process about 60-70 times more CPU jobs than GPU jobsthat seems consistent with Jul18. The metric of Job Hours consumed versus Available Hours in %, the picture is probably more like Jul17...30-40of CPU cycles are consumed and 25of GPU cycles. We shall wait for Jul18 metrics. +Based on the utilization string '0000'meaning all gpus are idle on a single node as polled every 10 mins during July 2018, the GTX gpus were 65completely idle and the K20s were 61completely idle.
- +
  
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/167.1530194371.txt.gz · Last modified: 2018/06/28 09:59 by hmeij07