User Tools

Site Tools


cluster:60

Table of Contents


Back

The basic configuration of the cluster is detailed below. This information was requested for inclusion in proposals and the like. I'm not regularly updating this information so email me if you need this page to be updated.

Meij, Henk 2007/12/03 13:39

Citations

follow this link for Publications & References

Hardware

The network design of the cluster described at this Link presents a schematic overview. The details of which are:

HardwareDescription
1 PowerEdge 2950 server The head node, the access point for our users. It also runs the LSF/HPC scheduler which allocates and executes jobs on behalf of the users on the compute nodes. It is only accessible via VPN from the internet.
1 PowerEdge 2950 server The ionode, it is connected to our NetApp file storage device via dual fiber channel connections. The ionode provides all compute nodes and head node with 4 TB of file system storage via NFS. In addition, the ionode provides all compute nodes and head node with a shared 1 TB scratch file system storage, also via NFS.
36 PowerEdge 1950 servers These are the compute nodes. Each compute node contains two physical processors (at 2.3 Ghz), each processor contains 4 cores. There are different memory footprints for these compute nodes, see below. All compute nodes contain two network interface cards and two local disks. The first disk is for the operating system, The second disk provides local scratch space in limited quantities.
1 low-end gigabit ethernet switch This switch is used by the scheduler software to obtain information from each node and dispatch jobs. It is located on a private network (192.168.x.x). All compute nodes, ionode and head node are connected to this switch.
1 high-end gigabit ethernet switch This switch serves exclusively for NFS traffic. It also is located on a private network (10.3.1.x). All compute nodes and the head node are connected via their second network interface card.
1 infiniband high-speed switch This high speed switch is specifically designed for parallel jobs. 16 of the computes nodes are connected to this switch. The performance of this switch is roughly 4-6x faster than the gigabit ethernet switches.
2 disk storage arrays These arrays (also known as MD1000 arrays) contain fast disks (at 15,000 RPM) which are dedicated to 4 compute nodes for dedicated, local, fast, scratch file systems. Particularly for users of the software Gaussian.
2 UPS devices To provide the head and ionode with power in case of a supply interruption. This would allow a clean shutdown.
cables, cables, cables, cables, cables, cables … blue, black, red, orange, …

Statistics

  • early bird access period started: april 15th, 2007.
  • production status started: october 15th, 2007.
  • 16 compute nodes, each with 8 gb memory, are connected to the Infiniband switch. They comprise the 'imw' queue.
  • 08 compute nodes, each with 4 gb memory. They comprise the 'elw' queue.
  • 04 compute nodes, each with 8 gb memory. They comprise the 'emw' queue.
  • 04 compute nodes, each with 16 gb memory. They comprise the 'ehw' queue.
  • 04 compute nodes, each with 16 gb memory and local fast disks. They comprise the 'ehwfd' queue.
  • 320 gb of total memory across all compute nodes.
  • Up from 192 gb when cluster was delivered.
  • 4 TB of file system storage for home directories.
  • 1 TB of file system for shared, scratch access by the compute nodes.
  • 42 users accounts of which 17 are faculty (excluding members of dept ITS).
  • 5 external users (guest) accounts.
  • Job throughput rates observed: 0.8 to 65 jobs/hour.
  • Job run times observed; from less than an hour to over 3 months.
  • Total number of jobs processed since powered up: 120,000+.
  • Typical job processing size: about 80-150 jobs across all queues, during summmer we run at capacity.

Cluster Usage Graphs


Back

cluster/60.txt · Last modified: 2008/06/19 10:20 (external edit)