Back
HPC Power
As part of our reevaluation of our data center cooling capacity and overhaul, we need to get a handle on non-emergency power consumption in data center. This will be done by a third party consultant by clamping power cables in the penthouse of Excley. So I bought myself a metered PDU and have been busy plugging entire racks into it one at a time. I then measure “idle” and “peak” amperage consumption. The end picture should reveal how much Kw the HPC consumes of all non-emergency power (that is not on enterprise UPS). And we'll figure out how much Physical Plant spends supporting the HPC. Thanks PP!
“Idle” means not very busy, most of the time something is happening anyway (ie backups)
“Peak” means very busy, most queues entirely full.
Metered Amps * 208V = Watts
Watts * (24 * 365)/1000 = KwH/Y
kwH/Y * $0.125 = $/Y (mix cogen and utility, 2013 assumption by Pete of PP)
Double the $/Y to include cooling costs (Pete of PP assumption)
HPC | Idle | Peak | Idle | Peak | Idle | Peak | Idle | Peak | |
Rack | L6-30P | Amps | Amps | Watts | Watts | KwH/Y | KwH/Y | $/Y | $/Y | Note |
R4R04 | 3 | 22 | 37 | 4,576 | 7,696 | 40,085 | 67,416 | 5,010 | 8,427 | “sharptail rack” |
R4R05 | 4 | 17 | 34 | 3,536 | 7,072 | 30,975 | 61,950 | 3,872 | 7,743 | “greentail rack” |
R4R06 | 2 | 6 | | 1,204 | | 8,970 | | 1,121 | | empty |
R4R07 | 2 | 14 | 34 | 2,912 | 7,072 | 25,509 | 61,950 | 3,189 | 7,744 | “mw128/tinymem” |
R4R08 | 2 | 4 | | 832 | | 7,288 | | 911 | | mostly empty |
Total | 13 | 63 | 105 | 13,060 | 21,840 | 114,405 | 191,318 | 14,300 | 23,915 | |
In 2013 we found for our Dell Racks purchased in 2006 (full of PE1950 and PE2950 servers) Replace Dell Racks
115,000 KwH/Y per rack for power
14,375 $/Y per rack for power
So a great improvement in energy efficiency. We probably grew from near 5 teraflops to near 60 teraflops in this time period (double precision, floating point, theoretical).
Metered
R4R04 rack: three L6-30P supplies
All dual power supply nodes
n33-n37 one gpu/node, 94/120 cpu jobs = 5 amps
n33-n37 above with 3 network switches 8 amps
m33-n37 need to get data all gpus working = 7 amps
n78 with 0 jobs = 0 amps, all gpus working = 2 amps
mindstore0 and ringtail = 1 amps
cottontail and shartptail (no rsyncs) = 1 amps
cottontail and shartpail (rsyncs) = ?? amps ←– DO
under modest load 8+2+1+1=12, observed 14 amps on left PDUs (front/back)
split n78 to right PDU (right side 2 amps plus all single power)
all storage 50% on UPS in R4R05, minuscule reduction
All single power supply nodes add to right PDU
n38-n41 idle load = 3 amps, so 6 amps for all nodes
n38-n41 full load = 7 amps, so 14 for all nodes
TOTAL IDLE load (14)+(6+2)= 22 amps
TOTAL PEAK load (14+7)+(14+2)= 37 amps …37/90 or 41%
R4R05 rack: four L6-30P supplies
R4R06 rack: two L6-30P supplies
R4R07 rack: two L6-30P supplies
All single power supplies
mw128 10 nodes, idle = 4 amps, for all nodes 8 amps
mw128 10 nodes, peak = 8 amps, for all nodes 16 amps
11/28/2018 17 amps observed (with 6/18 nodes at load 36)
03/05/2019 23 amps observed (…wengang jobs, mw128 nodes only! 23+16=39of60 possible)
two switches = 2 amps
tinymem 7 nodes, idle = 2 amps, for all nodes 4 amps
tinymem 7 nodes, peak = 8 amps, for all nodes 16 amps (DOM,no hard disk)
TOTAL IDLE load 8+2+4= 14 amps
TOTAL PEAK load 16+2+16= 34 amps …34/60 is 51%
R4R08 rack: two L6-30P supplies
Mostly empty rack
All dual power supplies
TOTAL 4 amps
Back