User Tools

Site Tools


cluster:49

Table of Contents


Back

Lava/LSF works via a variety of daemon processes that communicate with each other.

  • Load Information Manager (LIM, master & slaves)
    • gathers built-in resource load information directly from /dev/kmem
    • forwards information to master LIM
    • gathers information of site defined resources
  • Process Information Manager (PIM)
    • runs on every host
    • started by LIM
    • gathers information on every process run on the server
  • Remote Execution Server (RES)
    • runs on every host
    • secure and remote execution of tasks
  • Slave Batch Daemon (SBD)
    • runs on every host
    • receives job requests from MBD
    • enforces load thresholds
    • maintains the state of jobs
  • Master Batch Daemon (MBD)
    • one per cluster
    • responds to user queries (bjobs, bqueues, bsub etc)
    • receives scheduling information from MBSCHD
    • dispatches jobs to SBDs
    • manages the queues
  • Scheduling Daemon (MBSCHD)
    • one per cluster
    • makes scheduling decisions
    • launched by MBD

eLIM

So an eLIM is a custom defined resource monitor. An “external LIM”. Hence eLIM. In order to make it work, the following configuration files need to be edited defining the resource. Here is my example in which i set up resource monitors for the available disk space (in MB) for the filesystems /sanscratch and /localscratch. It's noteworthy to mention that one is shared by all nodes, the other is not.

Once defined, users may use the output of these monitors in the resource request string of bsub. For example, run my job on a host in the heavy weight queue if more than 300G of scratch space in /sanscratch is available and 8 GB of memory can be allocated. Monitor “mem” is internal (provided by LSF), monitor “sanscratch” is external.

bsub -q 04-hwnodes -R “sanscratch>300000 & mem>8000” …

If you need a custom monitor, define it in english terms and email the request to hpcadmin@wesleyan.edu.

Conf

lsf.cluster.lava

Begin ResourceMap
RESOURCENAME  LOCATION
...
# one shared instance for all hosts -hmeij
sanscratch          [all]
# one instance local for each host -hmeij
localscratch        [default]
End ResourceMap

lsf.shared

Begin Resource
...
# below are custom resources -hmeij
   sanscratch   Numeric 30       N           (Available Disk Space in M)
   localscratch Numeric 30       N           (Available Disk Space in M)
End Resource

Program

Now we write a simple perl or bash program which reports the values we are interested in to standard output.

/share/apps/scripts/elim

#!/usr/bin/perl
# elim to report available disk space -hmeij
while (1) {

        $tmp = `df -B M /sanscratch /localscratch`;

        @tmp = split(/\n/,$tmp);
        $tmp[2] =~ s/\s+/ /g;
        $tmp[2] =~ s/^\s+//g;
        @f = split(/ /,$tmp[2]);
        chop($f[2]);
        $sanscratch = $f[2];

        $tmp[3] =~ s/\s+/ /g;
        $tmp[3] =~ s/^\s+//g;
        @f = split(/ /,$tmp[3]);
        chop($f[3]);
        $localscratch = $f[3];

        # nr_of_args name1 value1 name2 value2 ...
        $string = "2 sanscratch $sanscratch localscratch $localscratch";

        # you need the \n to flush -hmeij
        print "$string \n"; 
        # or use
        #syswrite(OUT,$string,1);

        # specified in lsf.shared
        sleep 30;

}
  • elim is owned by root:root
  • with -rwxr-xr-x permissions, and
  • copied to /opt/lava/6.1/linux2.6-glibc2.3-ia32e/etc/elim on each node that needs to report these values.

Test to make sure it works …

[root@nfs-2-2 ~]# /opt/lava/6.1/linux2.6-glibc2.3-ia32e/etc/elim
2 sanscratch 979605 localscratch 232989
2 sanscratch 979605 localscratch 232989
2 sanscratch 979605 localscratch 232989
...

Restarts

Restart LIMs and MBD …

[root@swallowtail ~]# lsadmin reconfig
...
[root@swallowtail ~]# badmin mbdrestart
...

Results

You can now query the monitors. Use lsload to view the collected information by host. Either brute force with the -l option (see below) or with the -R option. Once your -R option works with lsload, you can also use it with bsub too.

It's unfortunate we can't get the display to list the full value of all monitors (instead of 7e+04 etc) but it's just a display issue.

For example:

[root@swallowtail conf]# lsload -R "sanscratch>300000 & mem>8000"
HOST_NAME       status  r15s   r1m  r15m   ut    pg  ls    it   tmp   swp   mem
nfs-2-2             ok   0.0   0.0   0.1   0%   3.9   0  1382 7140M 4000M   16G
nfs-2-4             ok   1.0   2.4   7.7  36% 484.2   0 10080 7128M 3998M 9944M

To expand the localscratch column use (that's a CAPS 'I')

[root@swallowtail ~]# lsload -w -Ilocalscratch
HOST_NAME       status localscratch
swallowtail         ok            -
compute-1-6         ok      71248.0
compute-1-2         ok      71248.0
compute-1-9         ok      71248.0
compute-1-13        ok      71248.0
compute-1-14        ok      71248.0
compute-1-8         ok      71248.0
...

…brute force…

[root@swallowtail conf]# lsload -l
HOST_NAME               status  r15s   r1m  r15m   ut    pg    io  ls    it   tmp   swp   mem localscratch  sanscratch
swallowtail                 ok   0.0   0.0   0.0   0%  57.4   961   7     2 4424M 3996M  865M            -      979606
compute-1-6                 ok   0.0   0.0   0.0   0%   3.1    58   0  4608 7136M 4000M 3770M        7e+04      979606
compute-1-12                ok   0.0   0.0   0.0   0%   3.5    64   0 17616 7136M 4000M 3772M        7e+04      979606
compute-1-15                ok   0.0   0.0   0.0   0%   3.8    69   0  4192 7136M 4000M 3770M        7e+04      979606
nfs-2-2                     ok   0.0   0.0   0.2   0%   4.2    82   0  1380 7140M 4000M   16G        2e+05      979606
compute-1-2                 ok   0.0   0.0   0.0   0%   4.4    82   0  4608 7136M 3782M 3604M        7e+04      979606
compute-1-14                ok   0.0   0.0   0.0   0%   4.4    77   0 17616 7136M 4000M 3768M        7e+04      979606
compute-1-11                ok   0.0   0.0   0.0   0%   4.7    89   0  4608 7136M 3866M 3684M        7e+04      979606
compute-1-10                ok   0.0   0.0   0.0   0%   7.0   133   0 17616 7136M 3968M 3824M        7e+04      979606
ionode-1                    ok   0.0   0.6   0.2   0% 3e+03 5e+04   1  1757 6932M 4000M 1709M            -      979606
compute-1-9                 ok   0.0   0.0   0.0   0%   3.8    70   0    81 7136M 4000M 3770M        7e+04      979606
compute-1-13                ok   0.0   0.0   0.0   0%   3.6    67   0 17616 7148M 4000M 3766M        7e+04      979606
compute-1-7                 ok   0.0   0.2   0.9   3% 2e+03 3e+04   0  4608 7136M 4000M 3768M        7e+04      979606
compute-1-3                 ok   0.0   0.0   0.0   0%   4.0    72   0  4608 7136M 4000M 3774M        7e+04      979606
compute-1-8                 ok   0.0   0.2   0.0   0%   4.4    84   0  2940 6416M 4000M 3812M        7e+04      979606
compute-1-4                 ok   0.0   0.0   0.0   0%   4.9    91   0 17616 7136M 4000M 3770M        7e+04      979606
compute-1-1                 ok   0.3   0.0   0.0   0%   4.8    84   0  1731 7136M 3822M 3640M        7e+04      979606
compute-2-32                ok   1.0   1.0   1.0  13%   5.4    97   0  1447 7144M 4000M 3738M        7e+04      979606
nfs-2-1                     ok   1.0   8.7   7.3 100%   6.7   127   1    30 7140M 3958M 3614M        2e+05      979606
nfs-2-3                     ok   1.0   5.6   8.0  60%1014.0 2e+04   0 11704 7136M 3958M 3548M        2e+05      979606
compute-1-16                ok   1.0   1.0   1.0  13%   5.8   108   0  4604 7136M 4000M 3734M        7e+04      979606
nfs-2-4                     ok   1.2   8.5   8.6  94% 105.4  1974   0 10072 7128M 3998M 3544M        2e+05      979606
compute-1-23                ok   2.0   2.0   2.0  25%   5.5   103   0 17616 7140M 4000M 3658M        7e+04      979606
compute-1-27                ok   2.0   2.0   2.0  25%   5.4   104   0 17616 7140M 4000M 3550M        7e+04      979606
compute-2-30                ok   2.0   2.0   2.0  25%   5.5   103   0 17616 7148M 4000M 3644M        7e+04      979606
compute-1-26                ok   2.0   2.0   2.0  25%   5.4    99   0 17616 7140M 4000M 3644M        7e+04      979606
compute-1-17                ok   2.0   2.0   2.0  25%   5.9   106   0 17616 7140M 4000M 3636M        7e+04      979606
compute-2-29                ok   2.0   2.0   2.0  25%   6.2   114   0 17616 7148M 4000M 3634M        7e+04      979606
compute-1-25                ok   2.0   2.1   2.0  25%   5.3    99   0  4604 7140M 4000M 3642M        7e+04      979606
compute-1-19                ok   2.0   2.0   2.0  25%   4.8    88   0  4604 7140M 4000M 3636M        7e+04      979606
compute-2-31                ok   3.0   3.0   3.0  38%   5.5   109   0  4196 7144M 4000M 3488M        7e+04      979606
compute-1-20                ok   3.0   3.0   3.0  38%   4.8    93   0  4604 7140M 4000M 3562M        7e+04      979606
compute-1-21                ok   3.0   3.0   3.0  38%   5.8   110   0  4604 7140M 4000M 3638M        7e+04      979606
compute-1-22                ok   3.0   3.0   3.0  38%   6.0   110   0  4604 7140M 4000M 3636M        7e+04      979606
compute-1-18                ok   4.0   4.0   4.0  50%   5.6   100   0  4604 7140M 4000M 3552M        7e+04      979606
compute-1-24                ok   4.0   4.0   4.0  50%   5.8   105   0  4604 7140M 4000M 3546M        7e+04      979606
compute-2-28                ok   5.0   5.0   5.0  62%   4.3    83   0 17616 7140M 4000M 3460M        7e+04      979606
compute-1-5                 ok   7.0   7.0   7.0  88%   4.4    79   0  4608 7136M 4000M 2838M        7e+04      979606


Back

cluster/49.txt · Last modified: 2013/08/08 15:07 by hmeij