DokuWiki

Since i went to a workshop on Basic LSF 6.2 Configuration and Administration held in Boston by Platform Computing…

⇒ consider me dangerous

Our cluster is driven by Platform/Lava as the scheduler, which in essence is LSF 6.1 … so i've staged all the documentation at the link below. There is a ton of it, all very good. Covering all aspects of LSF, how to use it and administer it.

I shall be using this wiki page to document for myself what changes i'm going to apply to our environment. For example, one big plus would be to be able to “tile” parallel job submissions and control how many cores are used per host by one job (ie, request 8 cores but distribute them 2 per avaliable host). Or for example, change the default queue behavior from First-Come-First-Served to FairShare.

Ohlala! Watch out, here we go.

<hi #ff0000> Ouch, forget this. Lava can't handle these advanced settings. Total crash.
</hi><hi #98fb98> So i'll just make notes of interesting items found in the manual for later reference.</hi>
<hi #dda0dd>We need LSF for HPC.</hi>
<hi #ffff00>Items below without a timestamp are not applied.</hi>

LSF 6.1 Docs

Platform LSF HTML Documentation

⇒ format of entries below are …

time stamp
name of config file changed
parameters changed
description of what is expected

Scheduler Changes

— Henk Meij 2007/09/13 10:32
lsb.hosts
# overload -hmeij
compute-1-17.local     12    ()         ()      ()   ()   ()
compute-1-18.local     12    ()         ()      ()   ()   ()
...
compute-2-32.local     12    ()         ()      ()   ()   ()
...
Noticed that a ton of job run with extremely little memory footprints. However, each takes up one job slot, so 8 of them fill a host. But after the host is status is “Full” we still have tons of memory idle. So i'm first experimenting with raising the number of job slots on the hosts in queue 16-lwnodes (12 instead of the default 8).

— Henk Meij 2007/09/06 10:01
lsf.shared
Begin Resource
...
# below are custom resources -hmeij
   sanscratch   Numeric 30       N           (Available Disk Space in M)
   localscratch Numeric 30       N           (Available Disk Space in M)
End Resource
lsf.cluster.lava
RESOURCENAME  LOCATION
...
# one shared instance for all hosts -hmeij
sanscratch          [all]
# one instance local for each host -hmeij
localscratch        [default]
End ResourceMap
The custom perl program /opt/lava/6.1/linux2.6-glibc2.3-ia32e/etc/elim was copied to each node via cluster-fork. The above configuration files were edited followed by lsadmin reconfig, badmin mbdrestart, and badmin reconfig. This should load the custom resource into LSF. The elim program will report available disk space in megabytes to the local lim which then forwards the information to the master lim. Then it is available to end users via the resource string parameter, for example to request more than 300G of free space in /sanscratch … bsub -R “sanscratch>300000” …

— Henk Meij 2007/08/30 11:17
lsf.conf
LSB_SHORT_HOSTLIST=1
This parameter takes no blank spaces around the equal sign! Deh. What it does is present a more condensed lists of hosts when submitting parallel jobs. For example (before and after change) :
[hmeij@swallowtail gaussian]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
13914   hmeij   RUN   gaussian   swallowtail nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3 test       Aug 30 11:13
[hmeij@swallowtail gaussian]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
13914   hmeij   RUN   gaussian   swallowtail 8*nfs-2-3   test       Aug 30 11:13

— Henk Meij 2007/08/27 16:16
lsf.conf
ACCT_ARCHIVE_AGE = 30
Rotate the accounting file every 30 days. A cronjob removes monthly any files where the value is >1 … this because bacct will read all of them. For sample output view the cluster home page where you can find statistics on job processing calculated by bacct for the last 7 days (update early Am each day).

NA
none
[hmeij@swallowtail ~]# bsub -n 8 -R "span[ptile=2]" ...
This option requests 8 cores (job slots) for processing, and instructs the scheduler to assign no more than 2 cores per host. This should yield a better performance. Does not work in Lava.

NA
lsb.params
PARALLEL_SCHED_BY_SLOT = Y 
From the manual … “To submit jobs based on the number of available job slots instead of the number of processors.” If this is defined you can use the resource requirements (arguments to the -R option of bsub, see item above) … string span[hosts=1] indicates that all the processors allocated to this job must be on the same host; string span[ptile=value] indicates the number of processors (value) on each host that should be allocated to the job. Does not work in Lava.

NA
lsb.hosts
HOST_NAME              MXJ   r1m     pg    ls   tmp  DISPATCH_WINDOW
...
# overload infiniband enabled nodes 2:1 job slots/node 
compute-1-1.local      16    ()      ()    ()   ()   ()
...[all 16 nodes]...
default                 !    ()      ()    ()   ()   ()
The exclamation mark implies that the scheduler assigns as many job slots to a node as there are cores in the processors. Thus for all our nodes there will be 8 job slots. With the MXJ setting we can “overload” that setting. Since i noticed that the chemistry parallel jobs using Amber barely max the CPU out in terms of memory, we can experiment with this setting. It may be that we have to toggle back to a ratio of 1.5:1 versus 2:1 but we'll see. The output of bhosts will display the current settings per host. Works in Lava, but we can not take advantage of it with above parameter.

— Henk Meij 2007/08/28 16:07
lsb.params
MBD_SLEEP_TIME = 30       #mbatchd scheduling interval (60 secs is default)
SBD_SLEEP_TIME = 15       #sbatchd scheduling interval (30 secs is default)
JOB_ACCEPT_INTERVAL = 1   #interval for any host to accept a job 
                          # (default is 1 (one-fold of MBD_SLEEP_TIME))
The interval delay tells LSF to waits a short time between dispatching jobs to the same host of length as defined for the MAster Batch Daemon (MBD). The Slave Batch Daemon (SBD) interval defines the lenght of time in between the recalculations of resource parameters whcih are forwarded to MBD. Since we're not busy, we lower them a bit.

NA
lsf.conf
LSB_LOCALDIR=/share/apps/logs/lsb.events
Enables duplicate logging of lsb.events. Does not work in Lava.

Back