User Tools

Site Tools


cluster:60

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:60 [2008/06/19 10:20] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:​0|Back]]**
  
 +The basic configuration of the cluster is detailed below. ​ This information was requested for inclusion in proposals and the like.  I'm not regularly updating this information so email me if you need this page to be updated.
 +
 + --- //​[[hmeij@wesleyan.edu|Meij,​ Henk]] 2007/12/03 13:39//
 +
 +
 +
 +====== Citations ======
 +
 +follow this link for  **[[cluster:​53|Publications & References]]**
 +
 +====== Hardware ======
 +
 +The network design of the cluster described at this **[[Cluster:​28|Link]]** presents a schematic overview. ​ The details of which are:
 +
 +^Hardware^Description^
 +| 1 PowerEdge 2950 server | The head node, the access point for our users. ​ It also runs the LSF/HPC scheduler which allocates and executes jobs on behalf of the users on the compute nodes. It is only accessible via VPN from the internet.|
 +| 1 PowerEdge 2950 server | The ionode, it is connected to our NetApp file storage device via dual fiber channel connections. ​ The ionode provides all compute nodes and head node with 4 TB of file system storage via NFS.  In addition, the ionode provides all compute nodes and head node with a shared 1 TB scratch file system storage, also via NFS. |
 +| 36 PowerEdge 1950 servers | These are the compute nodes. ​ Each compute node contains two physical processors (at 2.3 Ghz), each processor contains 4 cores. ​ There are different memory footprints for these compute nodes, see below. ​ All compute nodes contain two network interface cards and two local disks. ​ The first disk is for the operating system, The second disk provides local scratch space in limited quantities. |
 +| 1 low-end gigabit ethernet switch | This switch is used by the scheduler software to obtain information from each node and dispatch jobs. It is located on a private network (192.168.x.x). All compute nodes, ionode and head node are connected to this switch. |
 +| 1 high-end gigabit ethernet switch | This switch serves exclusively for NFS traffic. It also is located on a private network (10.3.1.x). All compute nodes and the head node are connected via their second network interface card. |
 +| 1 infiniband high-speed switch | This high speed switch is specifically designed for parallel jobs.  16 of the computes nodes are connected to this switch. The performance of this switch is roughly 4-6x faster than the gigabit ethernet switches. |
 +| 2 disk storage arrays | These arrays (also known as MD1000 arrays) contain fast disks (at 15,000 RPM) which are dedicated to 4 compute nodes for dedicated, local, fast,  scratch file systems. Particularly for users of the software Gaussian. |
 +| 2 UPS devices | To provide the head and ionode with power in case of a supply interruption. This would allow a clean shutdown. |
 +| cables, cables, cables, cables, cables, cables ... blue, black, red, orange, ... ||
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +====== Statistics ======
 +
 +  * early bird access period started: april 15th, 2007.
 +  * production status started: october 15th, 2007.
 +
 +  * 16 compute nodes, each with 8 gb memory, are connected to the Infiniband switch. ​ They comprise the '​imw'​ queue.
 +  * 08 compute nodes, each with 4 gb memory. They comprise the '​elw'​ queue.
 +  * 04 compute nodes, each with 8 gb memory. They comprise the '​emw'​ queue.
 +  * 04 compute nodes, each with 16 gb memory. They comprise the '​ehw'​ queue.
 +  * 04 compute nodes, each with 16 gb memory and local fast disks. They comprise the '​ehwfd'​ queue.
 +
 +  * 320 gb of total memory across all compute nodes. ​
 +  * Up from 192 gb when cluster was delivered. ​
 +
 +  * 4 TB of file system storage for home directories.
 +  * 1 TB of file system for shared, scratch access by the compute nodes.
 +
 +  * 42 users accounts of which 17 are faculty (excluding members of dept ITS).
 +  * 5 external users (guest) accounts.
 +
 +  * Job throughput rates observed: 0.8 to 65 jobs/hour.
 +  * Job run times observed; from less than an hour to over 3 months.
 +  * Total number of jobs processed since powered up: 120,000+.
 +  * Typical job processing size: about 80-150 jobs across all queues, during summmer we run at capacity.
 +
 +[[cluster:​65|Cluster Usage Graphs]]
 +
 +\\
 +**[[cluster:​0|Back]]**
cluster/60.txt ยท Last modified: 2008/06/19 10:20 (external edit)