This is an old revision of the document!
So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions
Metric | CPU | Ratio | GPU | Notes June 2018 |
---|---|---|---|---|
Device Count | 72 | 3:1 | 24 | cpu all intel, gpu all nvidia |
Core Count | 1,192 | 1:54 | 64,300 | physical only |
Memory | 7,408 | 51:1 | 144 | GB |
Teraflops | 38 | 1.5:1 | 25 | double precision, floating point, theoretical |
Job Count | 2,834 | 3:1 | 1,045 | processed jobs irregardless of exit status |
Avail Hours | 715,200 | 50:1 | 14,400 | total for cpu cores, total for gpus |
Job Hours | 221,136 | 77:1 | 2,872 | cumulative hours of consumed usage |
Job Hours % | 31 | 6:1 | 5 | as a percentage of available |
Avail Hours2 | 561,600 | 39:1 | 14,400 | total cpu cores minus hp12's 256 cores, total gpus |
Job Hours2 % | 39 | 8:1 | 5 | more realistic…hp12 rarely used in June18 |
The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values into a string, since 01Jan2017, the string '0000' has occurred 10 million times out of 16 million observations. (GPUs are polled every 10 mins). Sad. The surprising strong GPU job count is due to the Amber group launching lots of small GPU jobs.
So were these 25 days in June 2018 an oddity?
Total Monthly CPU+GPU Hours | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Ju17 | Aug17 | Sep17 | Oct17 | Nov17 | Dec17 | Jan18 | Feb18 | Mar18 | Apr18 | May18 |
313,303 | 273,051 | 128,390 | 111,224 | 280,101 | 51,727 | 306,453 | 222,585 | 437,959 | 262,227 | 294,724 |
March is Honors' Theses time so lets look at Jul17 so we can compare that to Jul18 in august. 31 days in July is 744 hours.
Metric | CPU | Ratio | GPU | Notes July 2017 |
---|---|---|---|---|
Device Count | 72 | 4:1 | 20 | cpu all intel, gpu all nvidia |
Core Count | 1,192 | 1:42 | 50,000 | physical only |
Memory | 7,408 | 74:1 | 100 | GB |
Teraflops | 38 | 1.7:1 | 23 | double precision, floating point, theoretical |
Job Count | 12,798 | 18:1 | 722 | processed jobs irregardless of exit status |
Avail Hours | 886,848 | 60:1 | 14,880 | total cpu cores, total gpus |
Job Hours | 260,997 | 69:1 | 3,805 | cumulative hours of consumed usage |
Job Hours % | 30 | 1:1 | 26 | as a percentage of available |
Avail Hours2 | 696,384 | 47:1 | 14,880 | total for cpu cores minus hp12's 256 cores, total for gpus |
Job Hours2 % | 37 | 1.5:1 | 26 | more realistic…hp12 rarely used in June18 |
hp12
was barely used might not be correctBased on Jul17 we process about 60-70 times more CPU jobs than GPU jobs, that seems consistent with Jul18. The metric of Job Hours consumed versus Available Hours in %, the picture is probably more like Jul17…30-40% of CPU cycles are consumed and 25% of GPU cycles. We shall wait for Jul18 metrics.