User Tools

Site Tools


cluster:84

This is an old revision of the document!



Back

Cluster Support

So, the Dell cluster (petal/swallow-tails) has run out of support January 25th of this year. As I found out when I called on 03/01/2010 with a hardware problem. So the question is what to do next?

The Dell hardware is now 3 years old, but still in good condition. The failure rate during those 3 years has included: replaced 4 disks, 2 system boards and power fans, and perhaps 8-10 memory sticks. Lets assume that stays the same for now.

Contacted Dell for renewal and received a quote for $60,000 for 3 years. Well, I'd rather have the monies.

Contacted SMS for renewal and received a quote of $28,250 for 3 years. Well, I'd still rather have the monies. They are not willing to come down.

What would it take to do-it-yourself? What is the expectation for hardware support?

In addition, if the NSF proposal “recommended for funded”, gets funded, would we even consider support on the Dell cluster?

Costs

Support for OCS, Platform's Open Cluster Manager - Dell Edition for 3 years is $10,434. I think we can do without, we could always fall back to CentOS/Lava. Plus we can peruse their knowledhge base which is rather good. LSF licenses are perpetual so we're good there.

The Infiniband, Force10 gigE and Dell Powerconnect gigE switches we'll leave unsupported too. If we need switches we can steal them from the sharptail cluster (HP ProCurve 2748 gigE). Probably cheaper to just buy a new switch if they break.

Nodes & DIY

So, based on the price information below

  • $2500 buys a spare pe2950
  • $2000 buys a spare pe1950

We probably should have one of each on hand. If a server dies, for whatever reason, the spare can be put in place, and more spare parts are generated (minus the failed part).

We probably do not need to be prepared to replace a failed MD1000 enclosure and can decide on that when it happens. We do need some spare disks, say 3 of them ($750).

Power fans are the most likely to go, so purchase 10 of them ($500)? (That contains 4 fans for the servers in the ehwfd queue which currently have a single power source, leaving 6 spares).

The DIY proposal is then to spend roughly $6,000 now, as a starting point. Or spend it when things go awry, depends.

Or downsize the cluster when things go awray, anticipating the new cluster. Petaltail and Swallowtail could take over each other's services but it would take time, and be disruptive if it is Petaltail. Rebuilding Petaltail would imply wiping Swallowtail, and afterwards, reinstalling/relicensing commercial software.

Price Info

Collected on web circa early April 2010.

  • Dell 146GB 15K SAS Hard Drive MD1000
  • $250 or 14 of them $3,500
  • Ebay Dell PowerEdge 73GB 15K SAS Hard Drive 2900 MD1000
  • $125 or 14 of them $1,750
  • Ebay brand new MD1000 (assume no PERC controller, no disks, no cables)
  • $1,600
  • Dell MD1000 (PERC6 (backup), 2×146 gb 15K rpm scsi disks, SAS cable)
  • $3,498 with 2 disks or loaded with 14 146gb scsi drives $7,500
EbayDell Poweredge 29502x Quad C 2.66ghz,16gb (hard disks?)$1,800
EbayDell Poweredge 2950two quad core X5355,8gb 4x146gb 15k$2,350
EbayDell Poweredge 29502x Quad C 2.66ghz,16gb 2x72gb SAS$1,900
EbayDell Poweredge 19502x Quad Core 2.66ghz,8g 2x146g 15k$1,795
EbayDell Poweredge 19502x Quad Core 2.66ghz,8g 2x72g SAS$1,650
EbayDell Poweredge 19502x Quad Core 2.66ghz,16gb 2x72g SAS$1,950

Dell

Turns out Dell does sell refurbished hardware, so let's see what the price tag is for:

Q=1 PE2950 Stag=GTC2CC1 Ecode=366602301169
perc5/6 - raid 1 with dual 80 gb 7.2KRPM SATA, 8x1gb dimms, DVD-rom, extra dual-port NIC card, rails if you have, dual power supplies

Q=1 PE1950 Stag=B1VQBC1 Ecode=24058367713
perc5/6 - no raid, dual 80 gb 7.2KRPM SATA, 8x1gb dimms, rails if you have, dual power supplies

Q=1 PE1950 Stag=BYGPBC1 Ecode=26028510625
perc5/6 - no raid, dual 80 gb 7.2KRPM SATA, 8x2gb dimms, rails if you have, dual power supplies

Q=4 MD1000 disks Stag=F5TZBC1 Ecode=33004422433
currently have 36gb drives but may substitute 73gb or whatever, 15K RPM SAS

Q=1 Power Connect 2748 Stag=9730SB1
48 ports, gigE switch, dual power (not sure it is?)

… the rest are parts if you have those, otherwise i might just get more servers as defined above …

Q=2 Infiniband card + cable (if you have)
CISCO 4XIB CBL 3M-SUPERFLEX (A0636919) + CISCO STD PCI-e HCA 2PT-TALL BRKT 128MB ROHS (A0664392) (Infiniband card & cable)

Q=2 Ethernet card (if you have)
Dual Embedded Broadcom NetXtreme II 5708 Gigabit Ethernet NIC

Q=8 1gb DIMM for PE1950
Q=8 2gb DIMM for PE1950
Q=2 Redundant Power Supply with Dual Cords for PowerEdge 2950
Q=10 Redundant Power Supply with Y-Cord for PowerEdge 1950


Back

cluster/84.1272893818.txt.gz · Last modified: 2010/05/03 09:36 by hmeij