This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:86 [2010/04/29 18:41] hmeij |
cluster:86 [2010/05/13 18:29] (current) hmeij |
||
---|---|---|---|
Line 4: | Line 4: | ||
==== Cloud Or Not? ==== | ==== Cloud Or Not? ==== | ||
- | There is a lot of buzz about cloud computing. | + | There is a lot of buzz about cloud computing. |
==== About Us ==== | ==== About Us ==== | ||
- | First we need to assess our usage of our clusters and if the cloud can support that. We also need to stress that we are small liberal arts college, primarily undergraduate, | + | First we need to assess our usage of our clusters and if the cloud can support that. We also need to stress that we are a small liberal arts college, primarily undergraduate, |
- | ==== Users ==== | + | As I like to tell vendors whom quote me impressive statistics like "1,500 HPC users across 17 buildings mostly doing the same thing, all tied together with their HPC solution" |
+ | |||
+ | ==== Usage #1 ==== | ||
* Some users run hundreds and hundreds of serial jobs, which require a month or so to run, but only need 100 MB of memory or so and do everything in memory with their own checkpointing. | * Some users run hundreds and hundreds of serial jobs, which require a month or so to run, but only need 100 MB of memory or so and do everything in memory with their own checkpointing. | ||
Line 16: | Line 18: | ||
* Some users run tens and tens of jobs in serial fashion with modest IO requirements. | * Some users run tens and tens of jobs in serial fashion with modest IO requirements. | ||
* Some users run LAMMPS parallel over the ethernet switches. | * Some users run LAMMPS parallel over the ethernet switches. | ||
- | * Some users run Amber parallel jobs which run for weeks to month using the Infiniband interconnect. | + | * Some users run Amber parallel jobs (n=32-48) |
- | * Some users run Gaussian with large IO activity and need local fast disk space. | + | * Some users run Gaussian with large IO activity and need fast (local) |
- | ==== Usage ==== | + | ==== Usage #2 ==== |
- | If you follow this [[cluster: | + | If you follow this [[cluster: |
==== Clusters ==== | ==== Clusters ==== | ||
Line 28: | Line 30: | ||
The current problems we encounter are: | The current problems we encounter are: | ||
- | * Home directory disk space requirement, | + | * Home directory disk space requirement, |
* Fast scratch space, we have none (10 TB Lustre filesystem, carve out of SataBeast) | * Fast scratch space, we have none (10 TB Lustre filesystem, carve out of SataBeast) | ||
* Establish a data archive for users rather than have multiple copies (10 TB, carve out of SataBeast) | * Establish a data archive for users rather than have multiple copies (10 TB, carve out of SataBeast) | ||
* Only 16 out of 36 nodes on Infiniband | * Only 16 out of 36 nodes on Infiniband | ||
- | * Need more nodes with small memory footprint | + | * Need more nodes with small memory footprint, or more medium (12gb) nodes so small jobs can be spread wide. |
- | * Need more moderate memory footprint nodes (actually we need to get gaussian/ | + | * Need more moderate memory footprint nodes (actually we need to get gaussian/ |
- | * We need a database server | + | * We need a database server |
* Perhaps we need a better filesystem, but for now NFS is ok | * Perhaps we need a better filesystem, but for now NFS is ok | ||
* Heating/ | * Heating/ | ||
- | Our expectations are that if we buy new hardware we expect to obtain somewhere between 300-512 job slots, with 3 year support build in and then we do-it-ourselves | + | Our expectations are that if we buy new hardware we expect to obtain somewhere between 300-512 job slots with say $250K, with 3 year support build in and then we do-it-ourselves |
==== Cloud ==== | ==== Cloud ==== | ||
Line 47: | Line 49: | ||
* Cooling/ | * Cooling/ | ||
* New/ | * New/ | ||
+ | * New/ | ||
* The ability to design our private cloud based on our needs | * The ability to design our private cloud based on our needs | ||
* The ability to change our design based on project needs | * The ability to change our design based on project needs | ||
Line 53: | Line 56: | ||
* Web based front end - how to batch submit 100's of jobs? | * Web based front end - how to batch submit 100's of jobs? | ||
- | | + | * Input/ |
- | | + | * Software - how is this provided, ie Amber(MPI)/ |
- | * | + | |
- | * Software - how is this provide, ie Amber(MPI)/ | + | |
- | * | + | |
* Debugging/ | * Debugging/ | ||
- | | + | * VMs - apart from specifying OS type, do we specify nrs of CPUs, memory, local disk space, |
- | | + | |
- | * | + | |
* Scratch - is local high-speed scratch available in our cloud, allocated by job? | * Scratch - is local high-speed scratch available in our cloud, allocated by job? | ||
- | * | ||
* Support - what is, and what is not supported? | * Support - what is, and what is not supported? | ||
- | | + | * Pricing - pay as you go or a predefined |
- | | + | * Accounts - who manages |
- | * | + | * Security. |
- | * Accounts - can accounts | + | |
- | * | + | |
\\ | \\ | ||
**[[cluster: | **[[cluster: |