User Tools

Site Tools


cluster:167

This is an old revision of the document!



Back

CPU vs GPU

So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions

  • Data is period June 1 to June 25, 2018 (job information data ages out)
    • Maybe build monthly script if this turns out to be usable info
  • That period covers 600 hours of time
  • Assume 99% utilization of cpu core or gpu device
  • Available time is measured per cpu core but by gpu device
  • There is no good/bad metric
  • Never collated such data before
  • The GPU usage is based on detecting gpu reservations (gpu= flag)
Metric CPU Ratio GPU Notes
Device Count 72 3:1 24 cpu all intel, gpu all nvidia
Core Count 1,192 1:54 64,300 physical only
Memory 7,408 51:1 144 GB
Teraflops 38 1.5:1 25 double precision, floating point, theoretical
Avail Hours 715,200 50:1 14,400 total cpu cores, total gpus
Job Count 2,834 3:1 1,045 scheduled jobs irregardless of exit status
Job Hours 221,136 77:1 2,872 cummulative hours of consumed usage
Avail Hrs Util% 31 6:1 5 weeping…
Avail Hours2 561,600 39:1 14,400 total cpu cores - hp12's 256 cores, total gpus
Avail Hrs2 Util% 39 8:1 5 more realistic…

The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values int a string, since 01Jan2017, the string '0000' has occurred 10 million times out of 16 million observations. (GPUs are polled every 10 mins). Sad. The surprising strong GPU job count is due to the Amber group launches lots of small GPU jobs.

So were these 25 days in June 2018 an oddity?

Total Monthly CPU+GPU Hours
Ju17Aug17Sep17Oct17Nov17Dec17Jan18Feb18Mar18Apr18May18
313,303273,051128,390111,224280,10151,727306,453222,585437,959262,227294,724

Back

cluster/167.1530120602.txt.gz · Last modified: 2018/06/27 17:30 by hmeij07