Differences

This shows you the differences between two versions of the page.

--- cluster:133 [2014/08/08 17:51] – created hmeij
+++ cluster:133 [2015/03/18 18:26] (current) – [High Core Count - Low Memory Footprint] hmeij
@@ Line 1: / Line 1: @@
-http://www.nytimes.com/2014/08/08/science/new-computer-chip-is-designed-to-work-like-the-brain.html
+\\
+**[[cluster:0|Back]]**
+==== High Core Count - Low Memory Footprint ====
+I polled some folks with the problem described below to find a solution. Then ...
+[[http://www.nytimes.com/2014/08/08/science/new-computer-chip-is-designed-to-work-like-the-brain.html]]
+We're on the cusp of a new era!
+Other solutions than the one described below
+  * Amax 4U/288 cores [[http://www.amax.com/hpc/product.asp?value=High%20Density%20/%20Performance]]
+  * Microway 2U/144 cores [[http://www.microway.com/products/hpc-clusters/high-performance-computing-with-intel-xeon-hpc-clusters/]]
+==== Ideas ====
+One idea I received back was to look at the Intel Atom line of chips. From Andrew
+"We can definitely quote rackmounted Atom servers in fairly dense configurations. One example of what we could quote would be : Within each 3U enclosure :12x Sleds, each with TWO C2750 Atom systems on it. So per 3U box :: 24x C2750 Atom systems, each can have 2x 2.5" HDD, Up To 64GB Memory, 2x 10/100/1000 NIC, VGA Port".
+That's a 4-core chip (quoted) so 96 cores/3U. Could double soon with 8 core chip.
+  * Intels calls this design "microservers". From Tower, to rack, to blade, to microservers.
+  * Details at [[http://www.intel.com/content/www/us/en/servers/microservers.html]]
+  * [[http://newsroom.intel.com/community/intel_newsroom/blog/2013/09/04/intel-unveils-new-technologies-for-efficient-cloud-datacenters|Intel Unveils New Technologies for Efficient Cloud Datacenters]]
+So I went looking at my favorite vendor's hardware platform and found:
+{{:cluster:microbade.jpg?200|}}
+[[http://www.supermicro.com/products/MicroBlade/|MicroBlade!]] 896 cores in 6U. Ok then.
+  * 28 blades, 112 nodes, 4 nodes per blade, each node with
+    * 1x Atom C2750 8 core 2.4 Ghz chip
+    * up 32 GB ram (4 GB per core, way above what's needed)
+    * 1x 2.5" disk
+  * Virtual Media Over LAN (Virtual USB Floppy / CD and Drive Redirection)
+  * Do these PXE boot? How to get OS on drives?
+  * Other thoughts
+    * With that many nodes, /home would probably not be mounted
+    * So users would have to stage job data in /localscratch/JOBPID probably
+    * ... via scp from a target host
+==== Slurm ====
+And then we need something that can handle ten of thousand of jobs if we acquire such a dense core platform.
+Enter [[https://computing.llnl.gov/linux/slurm/|Slurm]], which according to their web site, "can sustain a throughput rate of over 120,000 jobs per hour".
+Now we're talking.
+Notes on Slurm are [[cluster:134|High Core Count - Low Memory Footprint]]
+==== Problem ====
+I've been asked to investigate the following: Our HPCC has slowly grown towards bigger servers with more memory and cores per node. The future holds for us several faculty whom will run very CPU intensive jobs, tens of thousands of them, but with small memory foot prints (I'm told in the order of 64-128 MB, so I'm assuming 192-256 MB per single core/node). So I either need huge amounts of cores with little memory or lots of tiny small core count blade servers.
+I'm going to investigate LXC Linux containers (https://linuxcontainers.org/) but I'm weary of performance when generating tiny VMs that will mostly be CPU bound.  Any ideas on hardware/software solutions would be greatly appreciated. There is no budget yet so I'm unsure how large this project will be. Thanks,
+\\
+**[[cluster:0|Back]]**