User Tools

Site Tools


cluster:176


Back

HPC Power

As part of our reevaluation of our data center cooling capacity and overhaul, we need to get a handle on non-emergency power consumption in data center. This will be done by a third party consultant by clamping power cables in the penthouse of Excley. So I bought myself a metered PDU and have been busy plugging entire racks into it one at a time. I then measure “idle” and “peak” amperage consumption. The end picture should reveal how much Kw the HPC consumes of all non-emergency power (that is not on enterprise UPS). And we'll figure out how much Physical Plant spends supporting the HPC. Thanks PP!

  • “Idle” means not very busy, most of the time something is happening anyway (ie backups)
  • “Peak” means very busy, most queues entirely full.
  • Metered Amps * 208V = Watts
  • Watts * (24 * 365)/1000 = KwH/Y
  • kwH/Y * $0.125 = $/Y (mix cogen and utility, 2013 assumption by Pete of PP)
  • Double the $/Y to include cooling costs (Pete of PP assumption)
HPC Idle Peak Idle Peak Idle Peak Idle Peak
Rack L6-30P Amps Amps Watts Watts KwH/Y KwH/Y $/Y $/Y Note
R4R04 3 22 37 4,576 7,696 40,085 67,416 5,010 8,427 “sharptail rack”
R4R05 4 17 34 3,536 7,072 30,975 61,950 3,872 7,743 “greentail rack”
R4R06 2 6 1,204 8,970 1,121 empty
R4R07 2 14 34 2,912 7,072 25,509 61,950 3,189 7,744 “mw128/tinymem”
R4R08 2 4 832 7,288 911 mostly empty
Total 13 63 105 13,060 21,840 114,405 191,318 14,300 23,915

In 2013 we found for our Dell Racks purchased in 2006 (full of PE1950 and PE2950 servers) Replace Dell Racks

  • 115,000 KwH/Y per rack for power
  • 14,375 $/Y per rack for power

So a great improvement in energy efficiency. We probably grew from near 5 teraflops to near 60 teraflops in this time period (double precision, floating point, theoretical).

Metered

R4R04 rack: three L6-30P supplies

  • All dual power supply nodes
    • n33-n37 one gpu/node, 94/120 cpu jobs = 5 amps
    • n33-n37 above with 3 network switches 8 amps
    • m33-n37 need to get data all gpus working = 7 amps
    • n78 with 0 jobs = 0 amps, all gpus working = 2 amps
    • mindstore0 and ringtail = 1 amps
    • cottontail and shartptail (no rsyncs) = 1 amps
    • cottontail and shartpail (rsyncs) = ?? amps ←– DO
    • under modest load 8+2+1+1=12, observed 14 amps on left PDUs (front/back)
      • split n78 to right PDU (right side 2 amps plus all single power)
      • all storage 50% on UPS in R4R05, minuscule reduction
  • All single power supply nodes add to right PDU
    • n38-n41 idle load = 3 amps, so 6 amps for all nodes
    • n38-n41 full load = 7 amps, so 14 for all nodes
  • TOTAL IDLE load (14)+(6+2)= 22 amps
  • TOTAL PEAK load (14+7)+(14+2)= 37 amps …37/90 or 41%

R4R05 rack: four L6-30P supplies

  • All dual power supplies
    • greentail, sanscratch storage, idle = 2 amps
    • greentail, 1TB write to sanscratch = 2 amps
    • above + 3 switches, petal/swallow/cotton2 tails, hpcmon = 5 amps
    • n1-n32 (25 bodes), idle = 5 amps
    • gaussian jobs on n1-n32 = 12 amps
  • TOTAL IDLE load 2+5= 7 amps x2 = 17 amps
  • TOTAL PEAK load 5+12= 17 amps x 2 = 34 amps … 34/120 or 28%

R4R06 rack: two L6-30P supplies

  • Mostly empty rack
    • AC unit working = 8 amps (120V)
    • two switches = 2 amps (208V)
    • TOTAL 6 amps

R4R07 rack: two L6-30P supplies

  • All single power supplies
    • mw128 10 nodes, idle = 4 amps, for all nodes 8 amps
    • mw128 10 nodes, peak = 8 amps, for all nodes 16 amps
      • 11/28/2018 17 amps observed (with 6/18 nodes at load 36)
      • 03/05/2019 23 amps observed (…wengang jobs, mw128 nodes only! 23+16=39of60 possible)
    • two switches = 2 amps
    • tinymem 7 nodes, idle = 2 amps, for all nodes 4 amps
    • tinymem 7 nodes, peak = 8 amps, for all nodes 16 amps (DOM,no hard disk)
  • TOTAL IDLE load 8+2+4= 14 amps
  • TOTAL PEAK load 16+2+16= 34 amps …34/60 is 51%

R4R08 rack: two L6-30P supplies

  • Mostly empty rack
  • All dual power supplies
    • ringtail, mindstoresrv1 storage = 2 amps
    • two switches = 2 amps
  • TOTAL 4 amps


Back

cluster/176.txt · Last modified: 2019/03/06 19:29 by hmeij07