User Tools

Site Tools


cluster:59


Back

Complete Documentation

It's all at this link COMPLETE DOCUMENTATION FOR LSF/HPC 6.2 and very good.

New Features in LSF 6.2

This page will be expanded to show examples of LSF/HPC advanced features.

The more information you can provide to the scheduler regarding run times, resources needed and when, the more efficient the scheduling will be. The examples below are just made up scenarios. Try to get familiar with them or ask for hands-on working sessions.

⇒ Also read up on the new queue configurations: Link

As part of the upgrade:

  • Jobs were terminated … for a list of which ones view External Link
  • The working directories of those terminated jobs were saved in /sanscratch/OLDJOBS, help your self …
  • When the new scheduler came online it started with JOBPID 101 … that may clobber some of your old output files so i've spooled the JOBPIDs forward to 30,000.
  • Some home directories have been relocated but /home/username remains the same, fyi.
  • Parallel job submission syntax has/will change! However, the “old way” still works. See below, i should have the documentation updated shortly. This will primarily affect the Amber users (who like to use multiple hosts), but not the Gaussian users (who like to use a single host).
  • We're still experiencing license issues … more later.

Exclusive

If you wish to use a compute node in an “exclusive” mode us the bsub -x … syntax. You may wish to do this for example if you want all the memory available to your job. Or you want all the cores. Note that in either case resources are “wasted”; if you allocated all the memory, cores may go idle, if you request all the cores, memory may go unused. Try to match your needs with the host resources.

Here is how it works, in your program …

#BSUB -q elw 
#BSUB -x 
#BSUB -J "myLittleJob"

Once your job runs …

[hmeij@swallowtail ~]$ bhosts
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
compute-1-18       closed          -      8      1      1      0      0      0

you will notice that the host status is now “closed” and runs 1 job.

[hmeij@swallowtail ~]$ bhosts -l compute-1-18
HOST  compute-1-18
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
closed_Excl    240.00     -      8      1      1      0      0      0      -

Please note that the matlab queue does not support this. There are other ways to obtain exclusive use ofcourse:

  • for serial jobs bsub -n 8 … requests all the cores on a single host.
  • bsub -R .. , see below.

Resource Reservation

bsub -R “resource_string” …

Very powerful argument you can give to bsub … for a detailed description read External Link.

Here is a simple example. A simple script, we're going to ask for 200 MB of memory.

...
# queue
#BSUB -q elw
#BSUB -R "rusage[mem=200]"
#BSUB -J "myLittleJob"
...

Submit job and observe the resource reservation (note the value under “mem” in the “Reserved” line). Any new jobs that would be submitted to this host can, while this job is running, only ask for a maximum of 3660M - 200M = 3460M. The scheduler will handle all this for you.

There are many, many options using the resource reservation options. You can introduce time based decay or accumulate behavior for resources. Read the External Link material above.

[hmeij@swallowtail ~]$ bsub < ./myscript 
Job <30238> is submitted to queue <elw>.

[hmeij@swallowtail ~]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
30238   hmeij   RUN   elw        swallowtail compute-1-21 myLittleJob Nov 20 10:10

[hmeij@swallowtail ~]$ bhosts -l compute-1-21
HOST  compute-1-21
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok             240.00     -      8      1      1      0      0      0      -

 CURRENT LOAD USED FOR SCHEDULING:
              r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem gm_ports
 Total         0.0   0.0   0.0    0%   1.7   127    0  1169 7116M 4000M 3660M      0.0
 Reserved      0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M  200M       -
...

There are two custom resources that have been defined outside of LSF. They are 'localscratch' and 'sanscratch' and the values represent the amount of free disk space. With the rusage option you similarly can reserve disk space for your job and avoid conflicts with other jobs.

Remember that /localscratch is local to the individual compute nodes and is roughly 70GB for all nodes except for the heavy weight nodes with the attached MD1000 disk arrays. The latter nodes, nfs-2-1 … nfs-2-4 (the ehwfd queue) have roughly 230 GB of /localscratch available. The /sanscratch file system is shared by all nodes.

Wall Clock Time

Not a new feature, but one which i strongly encourage you to use.
Queue policy of BACKFILL is a new option, defined at queue level.

With wall clock time information available for each job, the scheduler is able to exercise the BACKFILL policy. That is, if job A for example still has 6 hours to run and a job slot is available on that host, the scheduler will assign higher priorities to other jobs that can run on that host within 6 hours. The key here is that those unused job slots may be reserved for job B, scheduled to run once Job A finishes.

To specify …

#BSUB -W hours:minutes

For efficient backfilling, the queues should have a default RUNLIMIT defined. However, we do not apply this. Thus backfilling can only happen when users specify the -W option during job submission. Jobs that exceed these limits are terminated automatically.

Parallel Jobs

Old Way

Good news! It appears the “old way” of submitting jobs still works. That is with the use of the “mpirun” wrapper scripts. This method is not recommended because once submitted, LSF has no knowledge of the parallel tasks. But it still works, so in a pinch use your old scripts.

Spanning

A very handy feature. You may have to experiment with the impact on your jobs. Basically, if we ask for 16 jobslots we can dictate to the scheduler how many we want per node. Previously, the scheduler would fill up one host, then move to the next host etc.

But consider … 16 jobslots (cores) are requested and we want no more than 2 allocated per host. The resource request span instructs the scheduler to tile the parallel tasks across multiple hosts. So submit and observe the allocation.

#!/bin/bash
#BSUB -q imw
#BSUB -n 16
#BSUB -J test
#BSUB -o out
#BSUB -e err
#BSUB -R "span[ptile=2]"
...
[hmeij@swallowtail cd]$ bjobs

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
30244   hmeij   RUN   imw        swallowtail 2*compute-1-13:2*compute-1-14:
2*compute-1-8:2*compute-1-10:2*compute-1-4:2*compute-1-9:2*compute-1-16:
2*compute-1-7 test       Nov 20 11:04

This also works with the “old way” of submitting ;-)
Some jobs will benefit from this tremendously, others may not.

New Way

Lets start a new page.

Meij, Henk 2008/01/09 11:31


Back

cluster/59.txt · Last modified: 2008/01/09 19:04 (external edit)