User Tools

Site Tools



HPCC 36 Node Design Conference with Dell


After the conference, with the erratic behavior of our freight elevator enlightening everybody, i think we have but a few questions to work on:

Shall we use the 2nd disk in the compute nodes as /localscratch?yes
How large should the shared SAN scratch space /sanscratch be?1 TB with SAN thin provisioning
Shall we NFS mount directly from filer? (read about it)no, configure the io node
Shall we add a second gigabit ethernet switch? (read about it)this is a go! adding a Dell 2748 switch. also, add NIC3 for head node
Shall we license & install the Intel Software Tools Roll?no, evaluate portland first, then evaluate intel, then decide
What should the user naming convention be (similar or dissimilar then AD)?uid/gid from AD, set up some guest accounts in AD (this does not impact on-site install
These topics are discussed in detail on another page, click to go there. — Henk Meij 2007/01/31 11:31
contact person: Carolyn Arredondo , engineer: Tony Walker

Head Node

Eurytides marcellusEurytides marcellusEurytides marcellus
NIC1192.168private network
NIC310.3private network
local disks
root/ext310 GB
swapnonenone4 GB
export/state/partition1rest of disk
disks are striped & mirrored, make a /localscratch directory if needed
NFSio node:/sanscratchscratch space from SAN
NFSio node:/home/usershome directories for general users
NFSio node:/home/usernamehome directory for power user, repeat …
user logins permitted (ssh only, not VAS enabled)
what user naming convention? same or different as AD?
backups: nightly incremental backup via Tivoli
firewall shield: allow ssh (scp/sftp), bbftp, and http/https from
NFS traffic on Cisco switch, management/interconnect traffic on Dell switch

IO Node

NIC1192.168private network
NIC210.3private network
local disks
root/ext380 GB
swapnonenone4 GB
disks are striped & mirrored, no /localscratch needed
fiberfrom host:volumelocal mountsizebackup
Vol0filer2:/vol/cluster_scratch/sanscratch1TB thin provisionedno snapshots
Vol1filer2:/vol/cluster_home/home10 TB thin provisionedone weekly snapshot?
two 30 MB volumes were made to work with until NetApp disks arrive
Vol0/LUN0 exported as /sanscratch
Vol1/LUN0 exported as /home/users … for regular users, so user jdoe's home dir is /home/users/jdoe
Vol1/LUN1 exported as /home/username … for power user home dir, repeat …
no user logins permitted, administrative users only
backups: not sure how many snapshots we can support on 10 TB
NFS traffic on Cisco switch, management/interconnect traffic on Dell switch

Compute Nodes

NIC1192.168private network (16+16+4)
NIC210.3private network (16+16+4)
HCA1nainfiniband (16)
local diskssame as head node
root/ext310 GB
swapnonenone4 GB
export/state/partition1rest of disk
LWN: only first disk is used, mount second disk (80 GB) as /localscratch
HWN: 7*36 GB MD1000 15K RPM disks, raid 0, dedicated storage to each node via scsi, on /localscratch
if heavy weights nodes have a second hard disk … treat as spares
NFSio node:/sanscratchscratch space from SAN
NFSio node:/home/usersusers home directories
NFSio node:/home/usernamepower user home directory, … repeat
NFShead node:/export/shareapplication/data space
no user logins permitted, administrative users only
backups: none
NFS traffic on Cisco switch, management/interconnect traffic on Dell switch


Question is do we install & license the Intel Software Roll or go with the Portland compilers?

A set of ‘base’ components are always installed on a Platform OCS cluster while a Roll can be installed at any time. Available Platform Open Cluster Stack Rolls include:
* Available at an additional cost
# Free to non-commercial customers
Platform Lava RollEntry-level workload management provides commercial grade job execution, management and accounting. Based on Platform LSF
Clumon RollCluster interface for viewing ‘whole’ cluster status. It is also a great cluster dashboard tool
Ganglia RollHigh level view of the entire cluster load and individual node load
Ntop RollMonitor traffic on ethernet interfaces. Useful for debugging network traffic problems and passively collects network traffic on interfaces
Cisco Infiniband™ RollIB drivers and MPI libraries from Cisco
Modules RollCustomize your environment settings including libraries, compilers, and environment variables
Intel® Software Tools Roll*Delivers Intel Compilers and Tools
PVFS2 RollProvides high speed access to data for parallel applications
IBRIX RollIBRIX Fusion™ parallel file system
MatTool Roll#Application for basic system management. Manage disks, DNS, users and temporary files
Myricom Myrinet® RollMPI libraries and Myrinet Drivers from Myricom
Platform LSF HPC Roll*Intelligently schedule parallel and serial workloads to solve large, grand challenge problems while utilizing your available computing resources at maximum capacity
Intel MPI Runtime RollLibraries for running Applications compiled with Intel MPI
Volcano RollSimple cluster portal for Platform Open Cluster Stack. Enables job submission, Linpack as a single user account
Extra Tools RollBenchmark and Debugging Tools
Dell™ RollDell drivers and scripts to configure compute nodes for IPMI support


have not thought about this real deep yet, but a grab bag collection would be

HWN0408032large memory footprint (16 Gb), dedicated, fast, local scratch space
LWN1632128small memory footprint (04 Gb), shared, slower, SAN mounted scratch space
LWNi1632128small memory footprint (04 Gb), shared, slower, SAN mounted scratch space, infiniband
debug0102008on head node?
debugi0102008on head node? is this even useful?
  • Set up a “routing queue” with “esub” (aka “job submission filter”) in Lava? If a user does not specify a queue, “esub” figures it out. Could also submit a low priority job on a high priority queue if that queue is idle. (This routing of jobs is a 'todo' for later).

Design Diagram

Just one more switch, and an additional NIC for the head node, would do so much good! Suggested was to route the NFS traffic through the Cisco gigabit ethernet switch and buy and additional 48 port gigabit ethernet switch like a Dell 2748 switch. The administrative traffic and interconnect traffic would then use this second switch.

If this is an option by the time of the configuration, some additional steps would present themselves.

  • activate NIC2 on io node and connect to second private network 10.3
  • activate NIC2 on all compute nodes and connect to 10.3
  • insert NIC3 into head node and connect to 10.3
  • make sure system /etc/hosts, or Lava's 'hosts' file, is properly set up
  • make sure /etc/fstab is correct mounting io node's /home and /sanscratch via 10.3
  • (i think that's it)

One alternative is to NFS mount directly to SAN via a private network (means a new interface is needed for the SAN). This would essentially bypass the io node and certainly provide better performance if on a new private network from the SAN. This would work with one or two gigabit ethernet switches. The “idle io node” could then be a node for the debug queues and/or serve as a “backup head node” if needed.

:Cluster Design

Please note the Summary Update comments regarding the connection of the head node to the infiniband switch
Henk Meij 2007/04/10 09:52


cluster/19.txt · Last modified: 2007/04/10 09:54 (external edit)