User Tools

Site Tools


cluster:123

This is an old revision of the document!



Back

Replace Dell Racks

A Novella: Replace the Dell Racks with new hardware
Subtitle: A win-win solution proposed by Physical Plant and ITS

Once upon a time, back in 2013, two Dell racks full of compute nodes, sat noisily chewing away energy on the 5th floor of Science Tower. They sucked in nicely cooled air from the floor spewing it out the back of the racks at 105-110 degrees (F). They were giving the three Liebert cooling towers a run for their BTUs. So much so that if one failed the Dell racks needed to be powered down to avoid the data center reaching temperatures beyond 95 degrees (F). The Dell racks were in a foul mood since that last event not too long ago. And so, day after day, they consumed lots of BTUs, and with the ample supply of Watts coming from their L6-30 roots, converted it all into heat. Tons of heat, making life lousy for the Liebert family. Oh, and they performed some computational work too, but if even they did not, the energy consumption remained the same. That's a fact. They were 6 years old and determined to make it to 12. So the story goes.

The Dell racks contain 30 compute nodes, two UPS units, two disks arrays and two switches. We have measured 19 nodes power consumption (pulling one the dual power units out) with a Kill-A-Watt meter for over 775+ hours. The mean power consumption rate is 418.4 watts. That totals to 109,956 KwH/year in power consumption ((watts/1000 Kw per hour) * 24 hours * 365 days * 30 servers). This is a low water mark, it only takes into account thwe compute nodes but that will be the majorioty of energy consumption. We also measured one rack's consumption at the utility panel and Peter's caculation yields 126,000 KwH/year which can be considered a hig water mark)

Based on 12.5 cents per Kwh (this is an all inclusive cost including natural gas cost, heat recoup costs, distribution, maintenance etc) the hardware burns away $13,744.50 per year. Best guess is cooling costs are at least that (another possible low water mark) so the total cost for all power consumption is $27,489 per year. If we run the hardware for another 3 years that total cost is $82,467.

If we could replace, or approximate, the 30 compute nodes' computational power (0.6 teraflops) and job slots (240 cores) with new hardware that consumes 50% less in power, our ROI is 6 years based on the low water mark numbers.

So, I've collected quotes and they are rolling in. I've taken $82K target as a budget so I can downscale from there. Here is a comparison

old racks: 30 nodes, 240 job slots, 12,552 watts, 670 gigaflops (a measure of computational performance)

vendor 1: 14 nodes, 224 job slots, 5,400 watts, 4,659 gigaflops

vendor 2: 10 nodes, 200 job slots, 4,597 watts, 4,480 gigaflops

So we can do it with about 40% of the energy consumption. And I can do it with half the number of nodes listed because with hyperthreading the number of job slots can be doubled (just need more memory). Our conclusion was “ all power consumption is $27,489 per year”. That means $55K is enough and our ROI is 2 years.

My intention is to convince the faculty of this. Do you agree? Then we can take it to Dave Baird.

-Henk

From: Meij, Henk Sent: Wednesday, October 02, 2013 12:10 PM To: Staye, Peter Subject: RE: Meeting with Peter 01Oct13 Summary

oh, btw the nodes are on 208V utility power, makes no difference in our calculations though.

-Henk

From: Meij, Henk Sent: Wednesday, October 02, 2013 11:42 AM To: Staye, Peter Subject: Meeting with Peter 01Oct13 Summary

My attempt at summarizing our conversation. Make corrections if I got things wrong.

-Henk

will save y'll the details but after many measurements and meeting with Peter Staye (physical plant) we are both in agreement that the dell 30 compute nodes “total cost for all energy consumption is $27,489 per year”. (power+cooling).

Next step was to collect quotes for a target budget of $82K, 3 years of dell energy consumption. That's just so that I can downscale from there because the new racks ofcourse still consume energy. But the details are striking. (I'm getting 4 quotes but the first 3 in are pretty close).

dell: 30 nodes, 2.66 ghz, 4 mb L-cache (for cpu), 240 cores, 80 gb local drive, 340 gb total ram, 12,555 watts (power no cooling), 670 gigaflops (actual measure)

microway: 14 nodes, 2.60 ghz, 20 mb L-cache (for cpu), 224 cores, 1TB local drive, 1,792 gb total ram, 5,400 watts (power no cooling), 4,659 gigaflops (theoretical)

That means microway hardware burns but 40% of the dell's power (not cooling).

If we take half the microway nodes, turn hyperthreading on (like on queue mw256, doubling the core count), raise the memory to 256gb/node (8gb/core) we would be very close to dell's job slot offering (224 vs 240), have 5x more memory, and 7x more gigaflops.

That will easily fit within $55K (two year of Dell power costs) and obtain a ROI of 2 years. Actually, within 2 years but a max of 2 years.

Peter Staye agrees.

The end.


Back

cluster/123.1381414087.txt.gz · Last modified: 2013/10/10 14:08 by hmeij