cluster:86
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:86 [2010/04/29 18:02] – hmeij | cluster:86 [2010/05/13 18:29] (current) – hmeij | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| ==== Cloud Or Not? ==== | ==== Cloud Or Not? ==== | ||
| + | |||
| + | There is a lot of buzz about cloud computing. | ||
| + | |||
| + | ==== About Us ==== | ||
| + | |||
| + | First we need to assess our usage of our clusters and if the cloud can support that. We also need to stress that we are a small liberal arts college, primarily undergraduate, | ||
| + | |||
| + | As I like to tell vendors whom quote me impressive statistics like "1,500 HPC users across 17 buildings mostly doing the same thing, all tied together with their HPC solution" | ||
| + | |||
| + | ==== Usage #1 ==== | ||
| + | |||
| + | * Some users run hundreds and hundreds of serial jobs, which require a month or so to run, but only need 100 MB of memory or so and do everything in memory with their own checkpointing. | ||
| + | * Some users run hundreds and hundreds of serial jobs, which require modest amount of memory, run overnight, and whose output becomes input for a single matlab job that runs for weeks. | ||
| + | * Some users run tens and tens of jobs in serial fashion with modest IO requirements. | ||
| + | * Some users run LAMMPS parallel over the ethernet switches. | ||
| + | * Some users run Amber parallel jobs (n=32-48) which run for weeks to a month using the Infiniband interconnect. | ||
| + | * Some users run Gaussian with large IO activity and need fast (local) disk space. | ||
| + | |||
| + | |||
| + | ==== Usage #2 ==== | ||
| + | |||
| + | If you follow this [[cluster: | ||
| + | |||
| + | ==== Clusters ==== | ||
| + | |||
| + | The current problems we encounter are: | ||
| + | |||
| + | * Home directory disk space requirement, | ||
| + | * Fast scratch space, we have none (10 TB Lustre filesystem, carve out of SataBeast) | ||
| + | * Establish a data archive for users rather than have multiple copies (10 TB, carve out of SataBeast) | ||
| + | * Only 16 out of 36 nodes on Infiniband | ||
| + | * Need more nodes with small memory footprint, or more medium (12gb) nodes so small jobs can be spread wide. | ||
| + | * Need more moderate memory footprint nodes (actually we need to get gaussian/ | ||
| + | * We need a database server | ||
| + | * Perhaps we need a better filesystem, but for now NFS is ok | ||
| + | * Heating/ | ||
| + | |||
| + | Our expectations are that if we buy new hardware we expect to obtain somewhere between 300-512 job slots with say $250K, with 3 year support build in and then we do-it-ourselves during next 3 years, and at the end of 6 years consider the hardware "used up". | ||
| + | |||
| + | ==== Cloud ==== | ||
| + | |||
| + | My understanding of a private cloud at another, remote facility and the [dis]advantages of it are: | ||
| + | |||
| + | * Is it affordable? We do not need it to scale up for instance | ||
| + | * Cooling/ | ||
| + | * New/ | ||
| + | * New/ | ||
| + | * The ability to design our private cloud based on our needs | ||
| + | * The ability to change our design based on project needs | ||
| + | |||
| + | ==== Qs ==== | ||
| + | |||
| + | * Web based front end - how to batch submit 100's of jobs? | ||
| + | * Input/ | ||
| + | * Software - how is this provided, ie Amber(MPI)/ | ||
| + | * Debugging/ | ||
| + | * VMs - apart from specifying OS type, do we specify nrs of CPUs, memory, local disk space, etc? Does one job have exclusive use of requested VMs? | ||
| + | * Scratch - is local high-speed scratch available in our cloud, allocated by job? | ||
| + | * Support - what is, and what is not supported? | ||
| + | * Pricing - pay as you go or a predefined based on the set of resources in your cloud? | ||
| + | * Accounts - who manages accounts? | ||
| + | * Security. | ||
| \\ | \\ | ||
| **[[cluster: | **[[cluster: | ||
cluster/86.1272564125.txt.gz · Last modified: by hmeij
