It's all at this link COMPLETE DOCUMENTATION FOR LSF/HPC 6.2 and very good.
This page will be expanded to show examples of LSF/HPC advanced features.
The more information you can provide to the scheduler regarding run times, resources needed and when, the more efficient the scheduling will be. The examples below are just made up scenarios. Try to get familiar with them or ask for hands-on working sessions.
⇒ Also read up on the new queue configurations: Link
As part of the upgrade:
If you wish to use a compute node in an “exclusive” mode us the bsub -x …
syntax. You may wish to do this for example if you want all the memory available to your job. Or you want all the cores. Note that in either case resources are “wasted”; if you allocated all the memory, cores may go idle, if you request all the cores, memory may go unused. Try to match your needs with the host resources.
Here is how it works, in your program …
#BSUB -q elw #BSUB -x #BSUB -J "myLittleJob"
Once your job runs …
[hmeij@swallowtail ~]$ bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV compute-1-18 closed - 8 1 1 0 0 0
you will notice that the host status is now “closed” and runs 1 job.
[hmeij@swallowtail ~]$ bhosts -l compute-1-18 HOST compute-1-18 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW closed_Excl 240.00 - 8 1 1 0 0 0 -
Please note that the matlab
queue does not support this. There are other ways to obtain exclusive use ofcourse:
bsub -n 8 …
requests all the cores on a single host.bsub -R ..
, see below.
bsub -R “resource_string” …
Very powerful argument you can give to bsub
… for a detailed description read External Link.
Here is a simple example. A simple script, we're going to ask for 200 MB of memory.
... # queue #BSUB -q elw #BSUB -R "rusage[mem=200]" #BSUB -J "myLittleJob" ...
Submit job and observe the resource reservation (note the value under “mem” in the “Reserved” line). Any new jobs that would be submitted to this host can, while this job is running, only ask for a maximum of 3660M - 200M = 3460M. The scheduler will handle all this for you.
There are many, many options using the resource reservation options. You can introduce time based decay or accumulate behavior for resources. Read the External Link material above.
[hmeij@swallowtail ~]$ bsub < ./myscript Job <30238> is submitted to queue <elw>. [hmeij@swallowtail ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 30238 hmeij RUN elw swallowtail compute-1-21 myLittleJob Nov 20 10:10 [hmeij@swallowtail ~]$ bhosts -l compute-1-21 HOST compute-1-21 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 240.00 - 8 1 1 0 0 0 - CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem gm_ports Total 0.0 0.0 0.0 0% 1.7 127 0 1169 7116M 4000M 3660M 0.0 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 200M - ...
There are two custom resources that have been defined outside of LSF. They are 'localscratch' and 'sanscratch' and the values represent the amount of free disk space. With the rusage
option you similarly can reserve disk space for your job and avoid conflicts with other jobs.
Remember that /localscratch is local to the individual compute nodes and is roughly 70GB for all nodes except for the heavy weight nodes with the attached MD1000 disk arrays. The latter nodes, nfs-2-1 … nfs-2-4 (the ehwfd
queue) have roughly 230 GB of /localscratch available. The /sanscratch file system is shared by all nodes.
Not a new feature, but one which i strongly encourage you to use.
Queue policy of BACKFILL is a new option, defined at queue level.
With wall clock time information available for each job, the scheduler is able to exercise the BACKFILL policy. That is, if job A for example still has 6 hours to run and a job slot is available on that host, the scheduler will assign higher priorities to other jobs that can run on that host within 6 hours. The key here is that those unused job slots may be reserved for job B, scheduled to run once Job A finishes.
To specify …
#BSUB -W hours:minutes
For efficient backfilling, the queues should have a default RUNLIMIT defined. However, we do not apply this. Thus backfilling can only happen when users specify the -W option during job submission. Jobs that exceed these limits are terminated automatically.
Good news! It appears the “old way” of submitting jobs still works. That is with the use of the “mpirun” wrapper scripts. This method is not recommended because once submitted, LSF has no knowledge of the parallel tasks. But it still works, so in a pinch use your old scripts.
A very handy feature. You may have to experiment with the impact on your jobs. Basically, if we ask for 16 jobslots we can dictate to the scheduler how many we want per node. Previously, the scheduler would fill up one host, then move to the next host etc.
But consider … 16 jobslots (cores) are requested and we want no more than 2 allocated per host. The resource request span
instructs the scheduler to tile the parallel tasks across multiple hosts. So submit and observe the allocation.
#!/bin/bash #BSUB -q imw #BSUB -n 16 #BSUB -J test #BSUB -o out #BSUB -e err #BSUB -R "span[ptile=2]" ...
[hmeij@swallowtail cd]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 30244 hmeij RUN imw swallowtail 2*compute-1-13:2*compute-1-14: 2*compute-1-8:2*compute-1-10:2*compute-1-4:2*compute-1-9:2*compute-1-16: 2*compute-1-7 test Nov 20 11:04
This also works with the “old way” of submitting
Some jobs will benefit from this tremendously, others may not.