User Tools

Site Tools


cluster:95

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:95 [2011/02/14 16:22]
hmeij
cluster:95 [2013/07/24 11:00] (current)
hmeij [Newest Configuration]
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +  * the  [[cluster:108| Queue Update]] page 03/01/2013
 +
 +==== Newest Configuration ====
 +
 +The Academic High Performance Compute Cluster is comprised of two login nodes (greentail and swallowtail, both Dell PowerEdge 2050s).  Old login node petaltail (Dell PowerEdge 2950) can be used for testing code (does not matter if it crashes, it's primary duty is backup to physical tape library).
 +
 +Three types of compute nodes are available via the Lava scheduler: 
 +
 +  * 36 nodes with dual quad core (Xeon 5620, 2.4 Ghz) sockets in HP blades (SL2x170z G6) with memory footprints of 12 GB each, all on infiniband (QDR) interconnects.  288 job slots. Total memory footprint of these nodes is 384 GB. This cluster has been measured at 1.5 teraflops (using Linpack)
 +  * 32 nodes with dual quad core (Xeon 5345, 2.3 Ghz) sockets in Dell PowerEdge 1950 rack servers with memory footprints ranging from 8 GB to 16 GB.  256 job slots. Total memory footprint of these nodes is 340 GB. Only 16 nodes are on infiniband (SDR) interconnects, rest on gigabit ethernet switches. This cluster has been measured at 665 gigaflops (using Linpack)
 +  * 45 node with dual single core AMD Opteron Model 250 (2.4 Ghz) with a memory footprint of 24 GB.  90 job slots. Total memory footprint of the cluster is 1.1 TB. This cluster has an estimated capacity of 500-700 gigaflops.
 +
 +All queues are available for job submissions via the login nodes greentail and swallowtail; both nodes service all queues. Our total job slots is now 634 of which 380 are on infiniband switches for parallel computational jobs.  In addition  queue "bss24" consists of 90 job slots (45 nodes) which can provide access to 1.1 TB of memory; it is turned on by request (nodes are power inefficient).
 +
 +Home directory file system are provided (via NFS or IPoIB) by the login node "greentail" from a direct attached disk array. In total, 10 TB of /home disk space is accessible to the users and 5 TB of scratch space at /sanscratch.  In addition all nodes provide a small /localscratch disk space on the nodes local internal disk if file locking is needed (about 50 GB). Backup services are provided via disk-to-disk snapshot copies on the same array. 
 +
 +==== New Configuration ====
 +
 +(May 2011)
 +
 +Barring some minor work (involving 4 imw nodes and queue elw), all queues are now available for job submissions via the login nodes greentail and swallowtail; both nodes service all queues. Login node sharptail has been decomissioned, and login node petaltail will only perform administrative functions.
 + 
 +Our total jobslots is now 638 of which 384 are on infiniband (queues hp12 and imw, the former much faster).  These queues should be the target for parallel computations.  Queue ehwfd should be the target for jobs needing fast local scratch space.  Queue bss24 should be the target for large ethernet parallel jobs (queue offers 1 TB memory footprint). Matlab and Stata jobs should be submitted to their respective queues on greentail (license restrictions).
 + 
 +Also, I suggest staging your data and programs in /sanscratch/JOBPID and copy the results back to your home directory as the last step in your job (scheduler will erase /sanscratch/JOBPID).  This file system is a different set of disks than the disks serving /home.
 + 
  
  
Line 15: Line 42:
    
  
-The Dell cluster consists of two login nodes ("petaltail"/swallowtail"), the Load Scheduler Facility (LSF) job scheduler, and 36 compute nodes.  "petaltail" is the installer/administrative server while "swallowtail" manages commercial software licenses.  Both function as login access points.  Each compute node holds dual quad core (Xeon 5345, 2.3 Ghz) Dell PowerEdge 1950 with memory footprints ranging from 8 GB to 16 GB.  Total memory footprint of the cluster is 340 GB.  A high speed Cisco interconnect (Infiniband) connects 16 of these compute nodes for parallel computational jobs.  The scheduler manages access to 288 job slots across 7 queues. The cluster operating systems is Redhat Enterprise Linux 5.1. The hardware is 3 years old. This cluster has been measured at 665 gigaflops (using Linpack).+The Dell cluster consists of two login nodes ("petaltail"/swallowtail"), the Load Scheduler Facility (LSF) job scheduler, and 36 compute nodes.  "petaltail" is the installer/administrative server while "swallowtail" manages commercial software licenses.  Both function as login access points.  Each compute node holds dual quad core (Xeon 5345, 2.3 Ghz) Dell PowerEdge 1950 with memory footprints ranging from 8 GB to 16 GB.  Total memory footprint of the cluster is 340 GB.  A high speed Cisco interconnect (Infiniband) connects 16 of these compute nodes for parallel computational jobs.  The scheduler manages access to 288 job slots across 7 queues. The cluster operating systems is Redhat Enterprise Linux 5.1. The hardware is 3 years old. This cluster has been measured at 665 gigaflops (using Linpack). THE LOGIN NODE petaltail HAS BEEN DECOMMISSIONED AS A SCHEDULER, IT WILL ONLY PERFORM ADMINISTRATIVE TASKS.  JOBS CAN NOW BE SUBMITTED VIA swallowtail/greentail BOTH RUNNING LAVA (may 2011)
  
    
  
-The Blue Sky Studios (Angstrom hardware) consists of one login node ("sharptail"), the Lava job scheduler, and 46 compute nodes.  Each compute node holds dual single core AMD Opteron Model 250 (2.4 Ghz) with a memory footprint of 24GB.  Total memory footprint of the cluster is 1.1 TB.  The scheduler manages access to 92 job slots within a single queue.  The cluster operating systems is CentOS 5.3. The hardware is 7 years old. Of Note: because of it's energy inefficiencies only the login node and one compute node are powered on ... when jobs start pending in the queue, admins are notified automatically and more will be powered on to handle the load. This cluster has an estimated capacity of 500-700 gigaglops.+The Blue Sky Studios (Angstrom hardware) consists of one login node ("sharptail"), the Lava job scheduler, and 46 compute nodes.  Each compute node holds dual single core AMD Opteron Model 250 (2.4 Ghz) with a memory footprint of 24GB.  Total memory footprint of the cluster is 1.1 TB.  The scheduler manages access to 92 job slots within a single queue.  The cluster operating systems is CentOS 5.3. The hardware is 7 years old. Of Note: because of it's energy inefficiencies only the login node and one compute node are powered on ... when jobs start pending in the queue, admins are notified automatically and more will be powered on to handle the load. This cluster has an estimated capacity of 500-700 gigaflops. CLUSTER LOGIN NODE sharptail IS NO MORE.  THE QUEUE bss24 HAS BEEN MOVED TO CLUSTER greentail (45 nodes, May 2011).
  
    
cluster/95.1297718577.txt.gz ยท Last modified: 2011/02/14 16:22 by hmeij