\\
**[[cluster:0|Back]]**
Since i went to a workshop on **Basic LSF 6.2 Configuration and Administration** held in Boston by [[http://platform.com/|Platform Computing]]...
=> consider me dangerous :-P
Our cluster is driven by Platform/Lava as the scheduler, which in essence is LSF 6.1 ... so i've staged all the documentation at the link below. There is a ton of it, all very good. Covering all aspects of LSF, how to use it and administer it.
I shall be using this wiki page to document for myself what changes i'm going to apply to our environment. For example, one big plus would be to be able to "tile" parallel job submissions and control how many cores are used per host by one job (ie, request 8 cores but distribute them 2 per avaliable host). Or for example, change the default queue behavior from First-Come-First-Served to FairShare.
Ohlala! Watch out, here we go.
** Ouch, forget this. Lava can't handle these advanced settings. Total crash. \\
** So i'll just make notes of interesting items found in the manual for later reference.\\
We need LSF for HPC.\\
Items below without a timestamp are not applied.
===== LSF 6.1 Docs =====
**[[http://lsfdocs.wesleyan.edu/lsf6.1_complete_doc_set_html/|Platform LSF HTML Documentation]]**
=> //format of entries below are ...//
> //time stamp//
>> //name of config file changed//
>>> //parameters changed//
>>>> //description of what is expected//
===== Scheduler Changes =====
> --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/09/13 10:32//
>> ''lsb.hosts''
>>>
# overload -hmeij
compute-1-17.local 12 () () () () ()
compute-1-18.local 12 () () () () ()
...
compute-2-32.local 12 () () () () ()
...
>>>> Noticed that a ton of job run with extremely little memory footprints. However, each takes up one job slot, so 8 of them fill a host. But after the host is status is "Full" we still have tons of memory idle. So i'm first experimenting with raising the number of job slots on the hosts in queue 16-lwnodes (12 instead of the default 8).
> --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/09/06 10:01//
>> ''lsf.shared''
>>>
Begin Resource
...
# below are custom resources -hmeij
sanscratch Numeric 30 N (Available Disk Space in M)
localscratch Numeric 30 N (Available Disk Space in M)
End Resource
>> ''lsf.cluster.lava''
>>>
RESOURCENAME LOCATION
...
# one shared instance for all hosts -hmeij
sanscratch [all]
# one instance local for each host -hmeij
localscratch [default]
End ResourceMap
>>>> The custom perl program **''/opt/lava/6.1/linux2.6-glibc2.3-ia32e/etc/elim''** was copied to each node via ''cluster-fork''. The above configuration files were edited followed by ''lsadmin reconfig'', ''badmin mbdrestart'', and ''badmin reconfig''. This should load the custom resource into LSF. The elim program will report available disk space in megabytes to the local ''lim'' which then forwards the information to the master ''lim''. Then it is available to end users via the resource string parameter, for example to request more than 300G of free space in /sanscratch ... ''bsub -R "sanscratch>300000" ...''
> --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/08/30 11:17//
>> ''lsf.conf''
>>>
LSB_SHORT_HOSTLIST=1
>>>>This parameter takes no blank spaces around the equal sign! Deh. What it does is present a more condensed lists of hosts when submitting parallel jobs. For example (before and after change) :
>>>>
[hmeij@swallowtail gaussian]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
13914 hmeij RUN gaussian swallowtail nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3:nfs-2-3 test Aug 30 11:13
>>>>[hmeij@swallowtail gaussian]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
13914 hmeij RUN gaussian swallowtail 8*nfs-2-3 test Aug 30 11:13
> --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/08/27 16:16//
>> ''lsf.conf''
>>>
ACCT_ARCHIVE_AGE = 30
>>>> Rotate the accounting file every 30 days. A cronjob removes monthly any files where the value is >1 ... this because ''bacct'' will read all of them. For sample output view the cluster home page where you can find statistics on job processing calculated by ''bacct'' for the last 7 days (update early Am each day).
> NA
>> none
>>>
[hmeij@swallowtail ~]# bsub -n 8 -R "span[ptile=2]" ...
>>>> This option requests 8 cores (job slots) for processing, and instructs the scheduler to assign no more than 2 cores per host. This should yield a better performance. Does not work in Lava.
> NA
>> ''lsb.params''
>>
PARALLEL_SCHED_BY_SLOT = Y
>>>> From the manual ... "To submit jobs based on the number of available job slots instead of the number of processors." If this is defined you can use the resource requirements (arguments to the -R option of ''bsub'', see item above) ... string //span[hosts=1]// indicates that all the processors allocated to this job must be on the same host; string //span[ptile=value]// indicates the number of processors (value) on each host that should be allocated to the job. Does not work in Lava.
> NA
>> ''lsb.hosts''
>>>
HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW
...
# overload infiniband enabled nodes 2:1 job slots/node
compute-1-1.local 16 () () () () ()
...[all 16 nodes]...
default ! () () () () ()
>>>> The exclamation mark implies that the scheduler assigns as many job slots to a node as there are cores in the processors. Thus for all our nodes there will be 8 job slots. With the MXJ setting we can "overload" that setting. Since i noticed that the chemistry parallel jobs using Amber barely max the CPU out in terms of memory, we can experiment with this setting. It may be that we have to toggle back to a ratio of 1.5:1 versus 2:1 but we'll see. The output of ''bhosts'' will display the current settings per host. Works in Lava, but we can not take advantage of it with above parameter.
> --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/08/28 16:07//
>> ''lsb.params''
>>>
MBD_SLEEP_TIME = 30 #mbatchd scheduling interval (60 secs is default)
SBD_SLEEP_TIME = 15 #sbatchd scheduling interval (30 secs is default)
JOB_ACCEPT_INTERVAL = 1 #interval for any host to accept a job
# (default is 1 (one-fold of MBD_SLEEP_TIME))
>>>> The interval delay tells LSF to waits a short time between dispatching jobs to the same host of length as defined for the MAster Batch Daemon (MBD). The Slave Batch Daemon (SBD) interval defines the lenght of time in between the recalculations of resource parameters whcih are forwarded to MBD. Since we're not busy, we lower them a bit.
> NA
>> ''lsf.conf''
>>>
LSB_LOCALDIR=/share/apps/logs/lsb.events
>>>>Enables duplicate logging of lsb.events. Does not work in Lava.
\\
**[[cluster:0|Back]]**