Since i went to a workshop on Basic LSF 6.2 Configuration and Administration held in Boston by Platform Computing…
⇒ consider me dangerous
Our cluster is driven by Platform/Lava as the scheduler, which in essence is LSF 6.1 … so i've staged all the documentation at the link below. There is a ton of it, all very good. Covering all aspects of LSF, how to use it and administer it.
I shall be using this wiki page to document for myself what changes i'm going to apply to our environment. For example, one big plus would be to be able to “tile” parallel job submissions and control how many cores are used per host by one job (ie, request 8 cores but distribute them 2 per avaliable host). Or for example, change the default queue behavior from First-Come-First-Served to FairShare.
Ohlala! Watch out, here we go.
<hi #ff0000> Ouch, forget this. Lava can't handle these advanced settings. Total crash.
</hi><hi #98fb98> So i'll just make notes of interesting items found in the manual for later reference.</hi>
<hi #dda0dd>We need LSF for HPC.</hi>
<hi #ffff00>Items below without a timestamp are not applied.</hi>
Platform LSF HTML Documentation
⇒ format of entries below are …
time stampname of config file changedparameters changeddescription of what is expected
— Henk Meij 2007/09/13 10:32lsb.hosts
# overload -hmeij compute-1-17.local 12 () () () () () compute-1-18.local 12 () () () () () ... compute-2-32.local 12 () () () () () ...Noticed that a ton of job run with extremely little memory footprints. However, each takes up one job slot, so 8 of them fill a host. But after the host is status is “Full” we still have tons of memory idle. So i'm first experimenting with raising the number of job slots on the hosts in queue 16-lwnodes (12 instead of the default 8).
— Henk Meij 2007/09/06 10:01lsf.shared
Begin Resource ... # below are custom resources -hmeij sanscratch Numeric 30 N (Available Disk Space in M) localscratch Numeric 30 N (Available Disk Space in M) End Resource
lsf.cluster.lava
RESOURCENAME LOCATION ... # one shared instance for all hosts -hmeij sanscratch [all] # one instance local for each host -hmeij localscratch [default] End ResourceMapThe custom perl program/opt/lava/6.1/linux2.6-glibc2.3-ia32e/etc/elim
was copied to each node viacluster-fork
. The above configuration files were edited followed bylsadmin reconfig
,badmin mbdrestart
, andbadmin reconfig
. This should load the custom resource into LSF. The elim program will report available disk space in megabytes to the locallim
which then forwards the information to the masterlim
. Then it is available to end users via the resource string parameter, for example to request more than 300G of free space in /sanscratch …bsub -R “sanscratch>300000” …
— Henk Meij 2007/08/30 11:17lsf.conf
LSB_SHORT_HOSTLIST=1This parameter takes no blank spaces around the equal sign! Deh. What it does is present a more condensed lists of hosts when submitting parallel jobs. For example (before and after change) :
[hmeij@swallowtail gaussian]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 13914 hmeij RUN gaussian swallowtail nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3 test Aug 30 11:13
[hmeij@swallowtail gaussian]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 13914 hmeij RUN gaussian swallowtail 8*nfs-2-3 test Aug 30 11:13
— Henk Meij 2007/08/27 16:16lsf.conf
ACCT_ARCHIVE_AGE = 30Rotate the accounting file every 30 days. A cronjob removes monthly any files where the value is >1 … this becausebacct
will read all of them. For sample output view the cluster home page where you can find statistics on job processing calculated bybacct
for the last 7 days (update early Am each day).
NAnone[hmeij@swallowtail ~]# bsub -n 8 -R "span[ptile=2]" ...This option requests 8 cores (job slots) for processing, and instructs the scheduler to assign no more than 2 cores per host. This should yield a better performance. Does not work in Lava.
NAlsb.params
PARALLEL_SCHED_BY_SLOT = YFrom the manual … “To submit jobs based on the number of available job slots instead of the number of processors.” If this is defined you can use the resource requirements (arguments to the -R option ofbsub
, see item above) … string span[hosts=1] indicates that all the processors allocated to this job must be on the same host; string span[ptile=value] indicates the number of processors (value) on each host that should be allocated to the job. Does not work in Lava.
NAlsb.hosts
HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW ... # overload infiniband enabled nodes 2:1 job slots/node compute-1-1.local 16 () () () () () ...[all 16 nodes]... default ! () () () () ()The exclamation mark implies that the scheduler assigns as many job slots to a node as there are cores in the processors. Thus for all our nodes there will be 8 job slots. With the MXJ setting we can “overload” that setting. Since i noticed that the chemistry parallel jobs using Amber barely max the CPU out in terms of memory, we can experiment with this setting. It may be that we have to toggle back to a ratio of 1.5:1 versus 2:1 but we'll see. The output ofbhosts
will display the current settings per host. Works in Lava, but we can not take advantage of it with above parameter.
— Henk Meij 2007/08/28 16:07lsb.params
MBD_SLEEP_TIME = 30 #mbatchd scheduling interval (60 secs is default) SBD_SLEEP_TIME = 15 #sbatchd scheduling interval (30 secs is default) JOB_ACCEPT_INTERVAL = 1 #interval for any host to accept a job # (default is 1 (one-fold of MBD_SLEEP_TIME))The interval delay tells LSF to waits a short time between dispatching jobs to the same host of length as defined for the MAster Batch Daemon (MBD). The Slave Batch Daemon (SBD) interval defines the lenght of time in between the recalculations of resource parameters whcih are forwarded to MBD. Since we're not busy, we lower them a bit.
NAlsf.conf
LSB_LOCALDIR=/share/apps/logs/lsb.eventsEnables duplicate logging of lsb.events. Does not work in Lava.