User Tools

Site Tools


cluster:127

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:127 [2014/04/04 15:05]
hmeij [Testing Xen]
cluster:127 [2014/04/07 09:48] (current)
hmeij
Line 4: Line 4:
 ==== Virtual HPCC services ==== ==== Virtual HPCC services ====
  
-Thoughts on how to create virtual compute nodes in the HPCC stack. Specifically, trying to solve the need for tiny, but many, compute nodes for the nano physic applications. Like virtual compute nodes with a single core CPU with 100 MB or less of memory running a lean Scientific Linux operating system.  Here is a good introduction: [[http://www.slideshare.net/gpaterno1/comparing-iaas-vmware-vs-openstack-vs-googles-ganeti-28016375]]+Thoughts on how to create virtual compute nodes in the HPCC stack. Specifically, trying to solve the need for tiny, but many, compute nodes for the nano physic applications. Like virtual compute nodes with a single core CPU with 128 MB or less of memory running a lean Scientific Linux operating system.  Here is a good introduction: [[http://www.slideshare.net/gpaterno1/comparing-iaas-vmware-vs-openstack-vs-googles-ganeti-28016375]]
  
 ==== Building Ganeti ==== ==== Building Ganeti ====
Line 10: Line 10:
 I'll select Ganeti to start with as it appears the simplest to setup.  I've no need for services like fail-over or migration.  Just the ability to rapidly many tiny nodes from a template.  It also appears that Ganeti clusters can be embedded with Openstack later. I'll select Ganeti to start with as it appears the simplest to setup.  I've no need for services like fail-over or migration.  Just the ability to rapidly many tiny nodes from a template.  It also appears that Ganeti clusters can be embedded with Openstack later.
  
-On hold, these Xen tools are very nice, no need for Ganeti up front (virt-manager, virt-clone, virsh).+Update: On hold, these Xen tools (virt-manager, virt-clone, virsh) are very nice, no need for Ganeti up front .
  
 ==== Building Xen ==== ==== Building Xen ====
Line 23: Line 23:
 # disable selinux /etc/selinux/config # disable selinux /etc/selinux/config
  
-# add to xen kernel grub line soe stuff+# add to xen kernel grub line some stuff
 title CentOS (2.6.18-371.6.1.el5xen) title CentOS (2.6.18-371.6.1.el5xen)
         root (hd0,0)         root (hd0,0)
-        kernel /xen.gz-2.6.18-371.6.1.el5 dom0_mem=2048M,max:2048M dom0_max_vcpus=1 dom0_vcpus_pin allow_unsafe loglvl=all guestloglvl=all+        kernel /xen.gz-2.6.18-371.6.1.el5 dom0_mem=2048M,max:2048M dom0_max_vcpus=1 dom0_vcpus_pin allow_unsafe 
         module /vmlinuz-2.6.18-371.6.1.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet          module /vmlinuz-2.6.18-371.6.1.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet 
         module /initrd-2.6.18-371.6.1.el5xen.img         module /initrd-2.6.18-371.6.1.el5xen.img
Line 32: Line 32:
 # hostname fully qualified (in case we go to Ganeti later) # hostname fully qualified (in case we go to Ganeti later)
  
-# edit some settings in /etc/xen/xend-config, consul links below+# edit some settings in /etc/xen/xend-config, consult links below
  
 # turn off some services with chkconfig # turn off some services with chkconfig
Line 48: Line 48:
 Then launch ''virt-manager'' to look at your Dom0. Then launch ''virt-manager'' to look at your Dom0.
  
-Build a base clone and install what you need (like the Lava scheduler files). Then shut it down.+Build a base clone and install what you need (like the Lava scheduler files). Give it a static IP via /etc/rc.local so that when you clone it you can send it ssh commands. Then shut it down.  The Xen block device turned out to be 5 GB in size, a full CentOS install with X11 booting to runlevel 3.
  
-Now you build a script to clone and post prep new clones build of it.+Now you build a script to clone and post prep new clones build from it.
  
 <code> <code>
Line 76: Line 76:
 ssh vmdemo "scp /tmp/ifcfg-eth0 /etc/sysconfig/network-scripts/" ssh vmdemo "scp /tmp/ifcfg-eth0 /etc/sysconfig/network-scripts/"
  
 +# this is the clone
 ssh vmdemo reboot ssh vmdemo reboot
 # wait 2 mins # wait 2 mins
Line 84: Line 85:
 ==== Testing Xen ==== ==== Testing Xen ====
  
-^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^^^+ 3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms  ^^^
 ^  Queue, node, HT  ^  Jobs/Node, Loop Time  ^  Comment  ^ ^  Queue, node, HT  ^  Jobs/Node, Loop Time  ^  Comment  ^
 |  hp12, n15, no-HT  |  01 jobs, 481  |  | |  hp12, n15, no-HT  |  01 jobs, 481  |  |
 |  hp12, n15, no-HT  |  07 jobs, 482  |  | |  hp12, n15, no-HT  |  07 jobs, 482  |  |
 |  hp12, n2, yes-HT  |  01 jobs, 470  |  | |  hp12, n2, yes-HT  |  01 jobs, 470  |  |
-|  hp12, n2, yes-HT  |  15 jobs, 804  |  known penalty +|  hp12, n2, yes-HT  |  16 jobs, 804  |  known penalty 
-|  bss24, b40, no-HT  |  01 jobs, 844  |  equivalent to hp12, yes-HT  | +|  bss24, many, no-HT  |  01 jobs, 844  |  equivalent to hp12, yes-HT  |
-|  bss24, b40, no HT  |  02 jobs,  770  |  |+
 |  bss24vm, bvm1, VM  |  01 jobs, 776  |  1 vcpu, 100 ram  | |  bss24vm, bvm1, VM  |  01 jobs, 776  |  1 vcpu, 100 ram  |
 |  bss24vm, bvm1, VM  |  02 jobs, 850  |  2 vcpus, 100 ram  | |  bss24vm, bvm1, VM  |  02 jobs, 850  |  2 vcpus, 100 ram  |
-|  bss24vm, bvm1, VM  |  03 jobs, 1273  |  3 vcpus, 256 ram  | 
 |  bss24vm, bvm1, VM  |  04 jobs, 1735  |  4 vcpus, 512 ram  | |  bss24vm, bvm1, VM  |  04 jobs, 1735  |  4 vcpus, 512 ram  |
 |  bss24vm, bvm2, VM  |  08 jobs, 3582  |  8 vcpus, 1024 ram  | |  bss24vm, bvm2, VM  |  08 jobs, 3582  |  8 vcpus, 1024 ram  |
-|  bss24vm, bvm1, VM  |  32 jobs, XXXX   32 vcpus, 1024 ram  |+|  bss24vm, bvm1, VM  |  32 jobs, 13661   32 vcpus, 1024 ram  | 
 +|  bss24vm, bvm1, VM  |  03 jobs, 1273  |  3 vcpus, 256 ram, make it 128  | 
 +|  bss24vm, bvm1, VM  |  05 jobs, 2108  |  5 vcpus, 256 ram  | 
 +|  bss24vm, bvm1, VM  |  31 jobs, 13844  |  31 vcpus, 21504 ram  | 
 +^  multiple vms  ^^^
 |  bss24vm, bvm1-4, VM  |  4x01jobs, 1818  |  4x1vcpu, 4x128 ram  | |  bss24vm, bvm1-4, VM  |  4x01jobs, 1818  |  4x1vcpu, 4x128 ram  |
-|  bss24vm, bvm1-8, VM  |  8x01jobs, 3745  |  8x1vcpu, 8x128 ram  | +|  bss24vm, bvm1-8, VM  |  8x01jobs, 3745 (1 hour) |  8x1vcpu, 8x128 ram  | 
-|  bss24vm, bvm2-3, VM  |  2x02jobs, 1708  |  2x2vcpus, 2x128 ram, optimal physical to virtual cpus  +|  bss24vm, bvm1-6, VM  |  6x32jobs, 82457 (22hrs)  |  6x32vcpu, 6x1024 ram  | 
-|  bss24vm, bvm2-5, VM  |  4x02jobs, XXXX   4x2vcpus, 4x128 ram, optimal physical to virtual cpus  | +^  optimal physical to virtual cpu ratio for best performance according to Xen  ^^^ 
-|  bss24vm, bvm1-8, VM  |  8x02jobs, XXXX  |  8x2vcpus, 8x128 ram, optimal physical to virtual cpus  |+|  bss24, many, no HT  |  02 jobs,  826  |  equivalent to hp12, yes-HT  | 
 +|  bss24vm, bvm2-3, VM  |  2x02jobs, 1708  |  2x2vcpus, 2x128 ram,   
 +|  bss24vm, bvm2-5, VM  |  4x02jobs, 3497   4x2vcpus, 4x128 ram, optimal physical to virtual cpus ratio  | 
 +|  bss24vm, bvm1-8, VM  |  8x02jobs, XXXX  |  8x2vcpus, 8x128 ram, optimal physical to virtual cpus ratio  |
  
  
 +==== Hmmm... ====
  
 +We have some very old hardware here that does not support virtualization in the BIOS (hardware level).  So we use paravirtualization with nodes that contain 2 AMD Opteron Model 250 CPUs and 24 gb of memory.  The latter was the reasoning for virtualizing. But it seems not to work.  I can start multiple VMs (with either 1 vpcu or up to 32 vcpus) but the results are the same. When you run 2 VMs the performance is twice that of 1 VM, and so on. Odd.  I can observe the processes running and being actively busy, but the penalty for running more than one is staggering.  Must be an underlying hardware limitation.  Too bad.
  
 +The 6vms with 32vcpus finished work after 22 hours. That would imply one of these nodes could run say 20 VMs with 20x32=640 job slots. And we have 20 servers (12,800 job slots) that could be virtualized but there is no performance gain. Those servers will finish the 12,800 jobs in the same time not being virtualized. Too bad really.
 +
 +Following that thought, we could virtualize (KVM at hardware level) one of the new Asus servers; at 256 gb of memory this would yield 8,192 job slots for one node 128 mb per vcpu). That might be an option. 
  
 \\ \\
cluster/127.1396638311.txt.gz · Last modified: 2014/04/04 15:05 by hmeij