This is an old revision of the document!
So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions
Metric | CPU | Ratio | GPU | Notes |
---|---|---|---|---|
Device Count | 72 | 3:1 | 24 | cpu all intel, gpu all nvidia |
Core Count | 1,192 | 1:54 | 64,300 | physical only |
Memory | 7,408 | 51:1 | 144 | GB |
Teraflops | 38 | 1.5:1 | 25 | double precision, floating point, theoretical |
Avail Hours | 715,200 | 50:1 | 14,400 | total cpu cores, total gpus |
Job Count | 2,834 | 3:1 | 1,045 | scheduled jobs irregardless of exit status |
Job Hours | 221,136 | 77:1 | 2,872 | cummulative hours of consumed usage |
Avail Hrs Util% | 31 | 6:1 | 5 | weeping… |
Avail Hours2 | 561,600 | 39:1 | 14,400 | total cpu cores - hp12's 256 cores, total gpus |
Avail Hrs2 Util% | 39 | 8:1 | 5 | more realistic… |
The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values int a string, since 01Jan2017, the string '0000' has occurred 10 million times out of 16 million observations. (GPUs are polled every 10 mins). Sad. The surprising strong GPU job count is due to the Amber group launches lots of small GPU jobs.
So were these 25 days in June 2018 an oddity? March is Honors' These time so lets look at Jul17 so we can compare to to Jul18 in august.
Total Monthly CPU+GPU Hours | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Ju17 | Aug17 | Sep17 | Oct17 | Nov17 | Dec17 | Jan18 | Feb18 | Mar18 | Apr18 | May18 |
313,303 | 273,051 | 128,390 | 111,224 | 280,101 | 51,727 | 306,453 | 222,585 | 437,959 | 262,227 | 294,724 |