This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:167 [2018/06/26 16:57] hmeij07 [CPU vs GPU] |
cluster:167 [2018/08/01 14:11] (current) hmeij07 [July 2018] |
||
---|---|---|---|
Line 6: | Line 6: | ||
So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions | So the question was raised what does our usage look like between CPU and GPU devices? I have no idea what the appropriate metrics would be but lets start with comparing the hardware deployed. We'll also need to make some assumptions | ||
- | * Data is period | + | * Data is period |
- | * That period | + | * Maybe build monthly script if this turns out to be usable info |
+ | * That period | ||
* Assume 99% utilization of cpu core or gpu device | * Assume 99% utilization of cpu core or gpu device | ||
- | * Available | + | * Available |
- | * There is good/bad metric | + | * There is no good/bad metric |
* Never collated such data before | * Never collated such data before | ||
- | * The GPU usage is based on detecting gpu reservations (gpu= flag) | + | * The GPU jobs are detected |
- | * | + | |
- | ^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes ^ | + | |
- | | Device Count | 72 | 3:1 | 24 | cpu all intel, gpu all nvidia | | + | ^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes June 2018 ^ |
- | | Core Count | 1,712 | 1:37.6 | 64,300 | physical | + | | Device Count | 72 | 3:1 | 24 | cpu all intel, gpu all nvidia | |
- | | Avail Hours | 7,272,576 | 71.3:1 | 101,952 | total cpu cores, total gpus | | + | | Core Count | 1,192 | 1:54 |
- | | Job Count | 19,043 | 4:1 | 4,765 | scheduled | + | | Memory | 7,408 | 51:1 | 144 | GB | |
+ | | Teraflops | 38 | 1.5:1 | 25 | double precision, floating point, theoretical | | ||
+ | | Job Count | 2,834 | 3:1 | 1,045 | processed jobs irregardless of exit status | | ||
+ | | Avail Hours | 715, | ||
+ | | Job Hours | 221, | ||
+ | | Job Hours % | 31 | 6:1 | 5 | as a percentage of available | ||
+ | | Avail Hours2 | 561, | ||
+ | | Job Hours2 % | 39 | 8:1 | 5 | more realistic...hp12 rarely used | | ||
+ | |||
+ | The logs showing gpu %util confirm the extremely low GPU usage. When concatenating the four gpu %util values into a string, since 01Jan2017, the string ' | ||
+ | |||
+ | So were these 25 days in June 2018 an oddity? | ||
+ | |||
+ | ^ Total Monthly CPU+GPU | ||
+ | ^Ju17^Aug17^Sep17^Oct17^Nov17^Dec17^Jan18^Feb18^Mar18^Apr18^May18^ | ||
+ | |313,303|273,051|128, | ||
+ | |||
+ | March is Honors' | ||
+ | |||
+ | ^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes July 2017 ^ | ||
+ | | Device Count | 72 | 4:1 | 20 | cpu all intel, gpu all nvidia | | ||
+ | | Core Count | 1,192 | 1:42 | 50,000 | physical only | | ||
+ | | Memory | 7,408 | 74:1 | 100 | GB | | ||
+ | | Teraflops | 38 | 1.7:1 | 23 | double precision, floating point, theoretical | | ||
+ | | Job Count | 12, | ||
+ | | Avail Hours | 886, | ||
+ | | Job Hours | 260, | ||
+ | | Job Hours % | 30 | 1:1 | 26 | as a percentage of available | | ||
+ | | Avail Hours2 | 696, | ||
+ | | Job Hours2 % | 37 | 1.5:1 | 26 | more realistic...hp12 rarely used | | ||
+ | |||
+ | * Some noise in this data with the inability to match start and end of job (~15% of records) | ||
+ | * The assumption that '' | ||
+ | |||
+ | Based on Jul17 we process about 60-70 times more CPU Job Hours than GPU Job Hours, that seems consistent with Jun18. The metric of Job Hours consumed versus Available Hours in %, the picture is probably more like Jul17...30-40% of CPU cycles are consumed and 25% of GPU cycles. | ||
+ | |||
+ | If we take total hours consumed from Usage Report (the 313,303 hours for Jul17) we consumed about 45% of available hours (without hp12 in the mix). | ||
+ | |||
+ | We shall wait for Jul18 metrics. | ||
+ | |||
+ | ==== July 2018 ==== | ||
+ | |||
+ | ^ Metric ^ CPU ^ Ratio ^ GPU ^ Notes July 2017 ^ | ||
+ | | Device | ||
+ | | Core Count | 1,208 | 1:53 | 64,336 | physical only | | ||
+ | | Memory | 7,516 | 52:1 | 144 | GB | | ||
+ | | Teraflops | 38 | 1.5:1 | 25 | double precision, floating point, theoretical | ||
+ | | Job Count | 12, | ||
+ | | Avail Hours | 898, | ||
+ | | Job Hours | 322, | ||
+ | | Job Hours % | 36 | 36:1 | 1 | as a percentage of available | | ||
+ | |||
+ | |||
+ | Based on the utilization string ' | ||
**[[cluster: | **[[cluster: |