This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
cluster:133 [2014/08/08 17:51] hmeij created |
cluster:133 [2015/03/18 18:26] (current) hmeij [High Core Count - Low Memory Footprint] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | http:// | + | \\ |
+ | **[[cluster: | ||
+ | |||
+ | ==== High Core Count - Low Memory Footprint ==== | ||
+ | |||
+ | I polled some folks with the problem described below to find a solution. Then ... | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | We're on the cusp of a new era! | ||
+ | |||
+ | |||
+ | Other solutions than the one described below | ||
+ | |||
+ | * Amax 4U/288 cores [[http:// | ||
+ | * Microway 2U/144 cores [[http:// | ||
+ | ==== Ideas ==== | ||
+ | |||
+ | One idea I received back was to look at the Intel Atom line of chips. From Andrew | ||
+ | "We can definitely quote rackmounted Atom servers in fairly dense configurations. One example of what we could quote would be : Within each 3U enclosure :12x Sleds, each with TWO C2750 Atom systems on it. So per 3U box :: 24x C2750 Atom systems, each can have 2x 2.5" HDD, Up To 64GB Memory, 2x 10/100/1000 NIC, VGA Port". | ||
+ | |||
+ | That's a 4-core chip (quoted) so 96 cores/3U. Could double soon with 8 core chip. | ||
+ | |||
+ | * Intels calls this design " | ||
+ | * Details at [[http:// | ||
+ | * [[http:// | ||
+ | |||
+ | So I went looking at my favorite vendor' | ||
+ | {{: | ||
+ | |||
+ | [[http:// | ||
+ | * 28 blades, 112 nodes, 4 nodes per blade, each node with | ||
+ | * 1x Atom C2750 8 core 2.4 Ghz chip | ||
+ | * up 32 GB ram (4 GB per core, way above what's needed) | ||
+ | * 1x 2.5" disk | ||
+ | * Virtual Media Over LAN (Virtual USB Floppy / CD and Drive Redirection) | ||
+ | * Do these PXE boot? How to get OS on drives? | ||
+ | |||
+ | * Other thoughts | ||
+ | * With that many nodes, /home would probably not be mounted | ||
+ | * So users would have to stage job data in / | ||
+ | * ... via scp from a target host | ||
+ | |||
+ | |||
+ | |||
+ | ==== Slurm ==== | ||
+ | |||
+ | And then we need something that can handle ten of thousand of jobs if we acquire such a dense core platform. | ||
+ | |||
+ | Enter [[https:// | ||
+ | |||
+ | Now we're talking. | ||
+ | |||
+ | Notes on Slurm are [[cluster: | ||
+ | |||
+ | ==== Problem ==== | ||
+ | |||
+ | I've been asked to investigate the following: Our HPCC has slowly grown towards bigger servers with more memory and cores per node. The future holds for us several faculty whom will run very CPU intensive jobs, tens of thousands of them, but with small memory foot prints (I'm told in the order of 64-128 MB, so I'm assuming 192-256 MB per single core/node). So I either need huge amounts of cores with little memory or lots of tiny small core count blade servers. | ||
+ | |||
+ | I'm going to investigate LXC Linux containers (https:// | ||
+ | |||
+ | \\ | ||
+ | **[[cluster: | ||
+ |