User Tools

Site Tools


cluster:123

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
cluster:123 [2013/10/10 14:08]
hmeij [Replace Dell Racks]
cluster:123 [2013/10/23 13:43]
hmeij
Line 7: Line 7:
 Subtitle: A win-win solution proposed by Physical Plant and ITS Subtitle: A win-win solution proposed by Physical Plant and ITS
  
-Once upon a time, back in 2013,  two Dell racks full of compute nodes, sat noisily chewing away energy on the 5th floor of Science Tower.  They sucked in nicely cooled air from the floor spewing it out the back of the racks at 105-110 degrees (F).  They were giving the three Liebert cooling towers a run for their BTUs.  So much so that if one failed the Dell racks needed to be powered down to avoid the data center reaching temperatures beyond 95 degrees (F). The Dell racks were in a foul mood since that last event not too long ago. And so, day after day, they consumed lots of BTUs, and with the ample supply of Watts coming from their L6-30 roots, converted it all into heat. Tons of heat, making life lousy for the Liebert family. Oh, and they performed some computational work too, but if even they did not, the energy consumption remained the same. That's a fact. They were 6 years old and determined to make it to 12. So the story goes.+Once upon a time, back in 2013,  two Dell racks full of compute nodes, sat noisily chewing away energy on the 5th floor of Science Tower.  They drew in nicely cooled air from the floor spewing it out the back of the racks at 105-110 degrees (F).  They were giving the three Liebert cooling towers a run for their BTUs.  So much so that if one failed the Dell racks needed to be powered down to avoid the data center reaching temperatures beyond 95 degrees (F). The Dell racks were in a foul mood ever since that last eventnot too long ago. And so, day after day, they consumed lots of BTUs, and with the ample supply of Watts coming from their L6-30 roots, converted it all into heat. Tons of heat, making life lousy for the Liebert family. Oh, and they performed some computational work too, but if even they did not, the energy consumption remained the same. That's a fact. They were 6 years old and determined to make it to 12. So the story goes.
  
-The Dell racks contain 30 compute nodes, two UPS units, two disks arrays and two switches. We have measured 19 nodes power consumption (pulling one the dual power units out) with a Kill-A-Watt meter for over 775+ hours. The mean power consumption rate is 418.4 watts. That totals to 109,956 KwH/year in power consumption ((watts/1000 Kw per hour) * 24 hours * 365 days * 30 servers). This is a low water mark, it only takes into account thwe compute nodes but that will be the majorioty of energy consumption. We also measured one rack's consumption at the utility panel and Peter'caculation yields 126,000 KwH/year which can be considered a hig water mark+The Dell racks contain 30 compute nodes, two UPS units, two disks arrays and two switches. We have measured 19 nodes power consumption (pulling one of the dual power units out) with a Kill-A-Watt meter for over 775+ total hours. The mean power consumption rate is 418.4 watts. That totals to 109,956 KwH/year in power consumption ((watts/1000 Kw per hour) * 24 hours * 365 days * 30 servers). This is a low water mark, it only takes into account the compute nodes but that will be the majority of heat producers. We also measured one rack's consumption at the utility panel and Peter'calculation yields 126,000 KwH/year which can be considered a high water mark.
- +
  
-Based on 12.5 cents per Kwh (this is an all inclusive cost including natural gas cost, heat recoup costs, distribution, maintenance etc) the hardware burns away $13,744.50 per year. Best guess is cooling costs are at least that (another possible low water mark) so the total cost for all power consumption is $27,489 per year. If we run the hardware for another 3 years that total cost is $82,467.+Next we need to convert to a dollar value A residential electric bill contains a KwH cost as well as generation, distribution, transmission and other cost factors Typically the total KwH cost is 50% of the total bill.  Utility power in the data center comes from our cogen plant as well as CP&L.  The model for cogen attempts to balance both; 15% from CP&L and 85% cogen generated. If the cogen plant is downlike for maintenance, this jumps to 100% CP&L. One also needs to factor in natural gas costs, heat reclamation costs, etc. If we set the overall cost at 12.5 cents "per KwH" we're assuming 6.25 cents per cogen generated KwH consumedwhich seems reasonable. This value is set by Peter.
  
-  +Based on 12.5 cents the Dell compute nodes consume $13,744.50 per year in power. Best guess is cooling costs are at least that (another possible low water mark). So the total cost for both  power and cooling consumption is $27,489 per year.
- +
-If we could replace, or approximate, the 30 compute nodes' computational power (0.6 teraflops) and job slots (240 cores) with new hardware that consumes 50% less in power, our ROI is 6 years based on the low water mark numbers +
- +
- +
- +
-So, I've collected quotes and they are rolling in.  I've taken $82K target as a budget so I can downscale from thereHere is a comparison+
  
 +Next step was to collect vendor quotes for a target budget of $82K, 3 years of Dell energy consumption, an arbitrary length of time. That's so we can downscale from there because the new racks of course still consume energy. Four quotes were obtained and they show a similar pattern. Here is the comparison given key features.
    
- +Old hardware109,956 KwH/year for power\\ 
-old racks: 30 nodes, 240 job slots12,552 watts, 670 gigaflops (a measure of computational performance) +30 nodes, 2.66 ghz4 mb L-cache (for cpu), 240 cores (job slots),\\ 
- +80 gb local drive340 gb total ram12,555 watts (power no cooling)670 gigaflops (actual measure)
-vendor 1: 14 nodes224 job slots, 5,400 watts, 4,659 gigaflops +
- +
-vendor 2: 10 nodes200 job slots4,597 watts, 4,480 gigaflops +
    
 +New hardware v1: 47,304 KwH/year for power or 43% of old hardware\\
 +14 nodes, 2.60 ghz, 20 mb L-cache (for cpu), 224 cores (job slots),\\
 +1TB local drive, 1,792 gb  total ram, 5,400 watts (power no cooling), 4,659 gigaflops (theoretical)
  
-So we can do it with about 40% of the energy consumption. And I can do it with half the number of nodes listed because with hyperthreading the number of job slots can be doubled (just need more memory) Our conclusion was " all power consumption is $27,489 per year". That means $55K is enough and our ROI is years.+New hardware v2 (half of v1): 23,652 KwH/year for cooling or 22% of Old hardware\\ 
 +7 nodes, 2.60 ghz, 20 mb L-cache (for cpu), 112 cores (job slots),\\ 
 +1TB local drive, 1,792 gb  total ram, 2,700 watts (power no cooling), 2,329 gigaflops (theoretical)
  
- +If we reduced the node count to 7 (the minimum configuration to meet the job slot count of the Dell hardware), the total energy consumption (power plus cooling) would be 5,400 watts.   The total cost of running the new hardware (v2) would be $5,913 per year. That would imply savings of $21,576 per year. And that's using the low water mark. The job slot count would be 112 but with hyperthreading technology that can be doubled. We'd still want the 1,792 memory footprint (8 gb/core) and the gigaflops (2,329) still far exceeds Dell's performance.
  
-My intention is to convince the faculty of this Do you agree?  Then we can take it to Dave Baird.+In two years, the new hardware would have saved $43,152 on energy costs based on the low water mark (Dell's costs would equal $55K)We still need to adjust some minor issues:
  
- +  * There are enough Infiniband ports available to put all new hardware nodes on such a switch (add cards and cables cost for each node) 
 +  * The internal disks on each node need to be of a high speed (10K or better) and of a certain size (300 GB or larger) mimicking the Dell disk arrays (adds costs) 
 +  * we maybe able to add two more nodes by switching to a more exapansive lower wattage CPU (and remain within budget as well as below the 50% energy consumption threshold as compared with Dell's consumption. 
 +    * accomplished by switching from 8 core 2650v2 (130 watt) 2.6 ghz CPU to 10 core 2660v2 (95 watt) 2.2 ghz CPU
  
--Henk+But it is all very doable within a budget of $45-$50K. And it can be the solution for:
  
-  +  * replace Dell's racks functions and match or exceed its performance 
-From: Meij, Henk +  * seriously reduce energy consumption benefiting Physical Plant's bottom line 
-Sent: Wednesday, October 02, 2013 12:10 PM +  * allow ITS to treat the third Liebert cooling tower as backup/standby generating more energy savings 
-To: Staye, Peter +  * being way green
-Subject: RE: Meeting with Peter 01Oct13 Summary+
  
-oh, btw the nodes are on 208V utility power, makes no difference in our calculations though.+The Libert family rejoices.  The Dell family moves out. The end.
  
--Henk+==== Update ====
  
-  +The table below contains data for a cluster whose nodes are all on the Infiniband switch (and also ethernet switch for provision and data).  They also contain a 15K 300 GB SAS drives each for access to local fast disk (Gaussian users). It still deployes the 8-core CPUsthus 16 pysical cores per node32 hyperthreaded cores per node and in both cases 256 GB of memory.
-From: MeijHenk +
-Sent: WednesdayOctober 02, 2013 11:42 AM +
-To: Staye, Peter +
-Subject: Meeting with Peter 01Oct13 Summary+
  
-My attempt at summarizing our conversation. Make corrections if I got things wrong. 
  
- + Tnodes^  Tcores^  THcores^  Tmem gb^  Watts^  %of Dell^  TEnergy^  TEnergy $/Y^  TEsavings $/Y^  Quote $^   ROI Y^  Gflops^ 
 +|  10|  160|  320|  2,560|  3,650|  29|  7,300|  7,994|  19,495|  76,866|  3.9|  3,328| 
 +|  9|  144|  288|  2,304|  3,285|  26|  6,570|  7,194|  20,295|  69,290|  3.4|  2,995| 
 +|  8|  128|  256|  2,048|  2,920|  23|  5,840|  6,395|  21,094|  61,714|  2.9|  2,662| 
 +|  7|  112|  224|  1,792|  2,555|  20|  5,110|  5,596|  21,893|  54,138|  2.4|  2,329|
  
--Henk 
  
-  +==== Summary ====
- +
- +
- will save y'll the details but after many measurements and meeting with Peter Staye (physical plant) we are both in agreement that the dell 30 compute nodes "total cost for all energy consumption is $27,489 per year" (power+cooling). +
-  +
-Next step was to collect quotes for a target budget of $82K, 3 years of dell energy consumption.  That's just so that I can downscale from there because the new racks ofcourse still consume energy.  But the details are striking. (I'm getting 4 quotes but the first 3 in are pretty close). +
-  +
-dell: +
-30 nodes, 2.66 ghz, 4 mb L-cache (for cpu), 240 cores, +
-80 gb local drive, 340 gb total ram, 12,555 watts (power no cooling), 670 gigaflops (actual measure) +
-  +
-microway: +
-14 nodes, 2.60 ghz, 20 mb L-cache (for cpu), 224 cores, +
-1TB local drive, 1,792 gb  total ram, 5,400 watts (power no cooling), 4,659 gigaflops (theoretical) +
-  +
-That means microway hardware burns but 40% of the dell's power (not cooling). +
-  +
-If we take half the microway nodes, turn hyperthreading on (like on queue mw256, doubling the core count), raise the memory to 256gb/node (8gb/core) we would be very close to dell's job slot offering (224 vs 240), have 5x more memory, and 7x more gigaflops. +
-  +
-That will easily fit within $55K (two year of Dell power costs) and obtain a ROI of 2 years.  Actually, within 2 years but a max of 2 years. +
-  +
-Peter Staye agrees.  +
  
 +The Dell racks were bought in 2006. They contain 30 compute nodes, two UPS units, two disks arrays and two switches. Measurements of 2/3rds of the compute nodes with a Kill-A-Watt meter yields an average consumption of 418.4 watts (if the nodes are computing or not).  That totals to 109,956 KwH/year for power, a low water mark. Measurements at the utility panel for one of the racks yields a consumption of 126,000 KwH/year. Cooling requirements (not measured) are assumed to be equal to that.
  
 +Using the low water mark, and a cost per KwH (inclusive of cogen generation costs, maintenance, CP&L power imports, heat reclamation costs, etc), the total cost for both  power and cooling consumption is estimated at $27,489 per year for the Dell racks.
  
-The end.+New hardware could replace the Dell's functionality and reduce power/cooling needs in the data center while yielding significant savings.  An 8-node cluster with each node comprised of dual 8 core CPUs, each with 256 GB of memory and a 15K 300 GB hard disk, all nodes connected to a fast throuput/low latency switch, would match or exceed key parameters such as gigaflops of computational power and number of job slots provided (with hyperthreading enabled).
  
 +Such a cluster would consume 77% less energy generating $21,094 in saving per year (after accounting for energy needs of that 8-node cluster).  This implies that in 2.9 years the cost of acquiring that 8-node cluster ($61,714) will be recouped. Based on the low water mark.
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/123.txt · Last modified: 2013/10/23 18:52 by hmeij