This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:86 [2010/04/29 18:01] hmeij created |
cluster:86 [2010/04/29 18:41] hmeij |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | === Cloud Or Not? === | + | ==== Cloud Or Not? ==== |
+ | |||
+ | There is a lot of buzz about cloud computing. | ||
+ | |||
+ | ==== About Us ==== | ||
+ | |||
+ | First we need to assess our usage of our clusters and if the cloud can support that. We also need to stress that we are small liberal arts college, primarily undergraduate, | ||
+ | |||
+ | ==== Users ==== | ||
+ | |||
+ | * Some users run hundreds and hundreds of serial jobs, which require a month or so to run, but only need 100 MB of memory or so and do everything in memory with their own checkpointing. | ||
+ | * Some users run hundreds and hundreds of serial jobs, which require modest amount of memory, run overnight, and whose output becomes input for a single matlab job that runs for weeks. | ||
+ | * Some users run tens and tens of jobs in serial fashion with modest IO requirements. | ||
+ | * Some users run LAMMPS parallel over the ethernet switches. | ||
+ | * Some users run Amber parallel jobs which run for weeks to month using the Infiniband interconnect. | ||
+ | * Some users run Gaussian with large IO activity and need local fast disk space. | ||
+ | |||
+ | |||
+ | ==== Usage ==== | ||
+ | |||
+ | If you follow this [[cluster: | ||
+ | |||
+ | ==== Clusters ==== | ||
+ | |||
+ | The current problems we encounter are: | ||
+ | |||
+ | * Home directory disk space requirement, | ||
+ | * Fast scratch space, we have none (10 TB Lustre filesystem, carve out of SataBeast) | ||
+ | * Establish a data archive for users rather than have multiple copies (10 TB, carve out of SataBeast) | ||
+ | * Only 16 out of 36 nodes on Infiniband | ||
+ | * Need more nodes with small memory footprint | ||
+ | * Need more moderate memory footprint nodes (actually we need to get gaussian/ | ||
+ | * We need a database server node | ||
+ | * Perhaps we need a better filesystem, but for now NFS is ok | ||
+ | * Heating/ | ||
+ | |||
+ | Our expectations are that if we buy new hardware we expect to obtain somewhere between 300-512 job slots, with 3 year support build in and then we do-it-ourselves in next 3 years, and at the end of 6 years consider the hardware "used up". | ||
+ | |||
+ | ==== Cloud ==== | ||
+ | |||
+ | My understanding of a private cloud at another, remote facility and the [dis]advantages of it are: | ||
+ | |||
+ | * Is it affordable? We do not need it to scale up for instance | ||
+ | * Cooling/ | ||
+ | * New/ | ||
+ | * The ability to design our private cloud based on our needs | ||
+ | * The ability to change our design based on project needs | ||
+ | |||
+ | ==== Qs ==== | ||
+ | |||
+ | * Web based front end - how to batch submit 100's of jobs? | ||
+ | * Input/ | ||
+ | * Software - how is this provide, ie Amber(MPI)/ | ||
+ | * Debugging/ | ||
+ | * VMs - apart form specifying OS type, do we specify nrs of CPUs, memory etc? Does one job have exclusive use of requested VMs? | ||
+ | * Scratch - is local high-speed scratch available in our cloud, allocated by job? | ||
+ | * Support - what is, and what is not supported? | ||
+ | * Pricing - how is this organized, a pay as you go, or a predefined set of resources (use them or loose them)? | ||
+ | * Accounts - can accounts be tied organizations single sign on? | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |