DokuWiki

Usage Survey (circa early Nov 2006)

Brief synopsis of emerging themes

some commercial software will have to be bought outside of the grant: Matlab, Linda and Portland compilers
most current code is “coarse grain” parallel (meaning split a big problem into tiny pieces) rather than “fine grain” parallel (meaning the code itself is actual parallel implying internode communication)
work node requirements in general top out at 32 nodes per job, each node with 1 gb ram per cpu (2 gb / node)
some work nodes need a larger memory footprint: perhaps up to 4 nodes with dual quad-core cpus (16-32 gb ram per node)
64-bit compilations will be required with 32-bit backward compatibility for the selected operating system
few jobs will be data intentive, most will be cpu intensive
similarly, few jobs will be integer intensive, most will be floating point intensive

Some configuration scenarios

one head node for ssh access (also runs the scheduler)
one I/O node, mounts via fiber home and scratch filesystems from SAN (no user access),
work nodes NFS mount filesystems from I/O node
up to 60 work nodes, lightweight, little on-board disk dspace
* 16 of 60 work nodes connected via infiniband switch for low latency
* all other work nodes connected via gigabit ethernet
* 4 of those other work nodes with large memory footprint
redhat enterprise linux AS 4 (2.6 kernel for large memory access), perhaps CentOS ?
ROCKS model
PBS scheduler
OpenMPI

QUESTION: what about the speed and size of the scratch filesystem? NFS ok? global/parallel/clustered filesystem? current thinking is to have one massive scratch space and another for home directories served from SAN underneath all nodes. basically we fiber mount under one work node, the I/O node, and NFS mount from there to all others.

Generic responses of similar users.

	response#1	response#2	response#3	response#4	response#5	response#6
What applications do you plan to run on the Cluster? Is it home grown or third party. Please list some typical application names.	Accelrys InsightII	Matlab $$$	Matlab and homegrown applications.	AMBER, NAMD	C code. Also Gromacs, CHARMM, AMBER	“Gaussian”
For each application, is the application written in C, C++, or f90? Any particular compiler or packages you would need to use?		mex,mcc		C, C++ or Fortran $$$	C	Fortran + Linda $$$
Is your code parallel?		no		yes	10%	yes
Does the application require high interprocessor communication (i.e. Does your code benefit from using low latency interconnect or are these independent calculations that need to run on many processors)?				yes, low latency required	low latency, yes	4 8-core nodes + SMP
Does the application use MPI ?				yes	yes	no (Linda)
If MPI, how many nodes can the code can take advantage of, and with what interconnects (IB, myrinet or GE), and what is the memory requirement per core?				typical 16 to 32 nodes (scales to 100's), myrinet, 1 GB per node	<200MB	4 nodes with 2 quad-core processors and 16 - 32 GB ram per node
Does the Application use OPENMP, if yes how many CPU's does the application scale to? Does it use large amounts of shared memory? How large?				no		up to 32 cpus (6 nodes), up to 128 gb ram
Is the application data intensive or CPU intensive?		both	some data, some cpu	cpu	cpu	cpu, some arrays > 100 gb
Is the application 64-bit?		yes	no	32 or 64	32 or 64	64
Is the application integer or FP intensive?			some, FP yes	FP	FP	FP
RAM needed per processor? Amount of storage space required?		1GB (storage or ram?)		512 MB/cpu, 2 GB storage/run	Usually <200 MB, some 4 GB (ram?)	min 2 gb/core (?), scratch files 1,000 GB
Which Scheduling software does user prefer - LSF, PBS or SGE?				any	lsf or pbs, no pref

Home

DokuWiki

User Tools

Site Tools

Usage Survey (circa early Nov 2006)

Brief synopsis of emerging themes

Some configuration scenarios

Page Tools