User Tools

Site Tools


cluster:137

This is an old revision of the document!



Back

Submitting R2017+ Jobs

Wesleyan has obtained a campus wide site license for Matlab since version 2017. Hence there is no need to check out a license and the queue matlab has been removed. You can use version 2017 and onward on all queues in unlimited number off jobs. Your submit script should do something simple like

#!/bin/bash
# create this file then submit
# submit via 'bsub < run.sh'

#BSUB -q tinymem
#BSUB -J test
#BSUB -o out
#BSUB -e err
# match maxNumCompThreads(8); in matlab code
#BSUB -n 8
# all cores on single host


export PATH=/share/apps/CENTOS6/matlab/R2017b/bin:$PATH

matlab -nodisplay < myJob.m  > myJob.log

Submitting R2015a Jobs

2015 Matlab DCE has been uninstalled — Henk 2018/08/30 08:01

MATLAB's Distributed Computing Server R2015a requires CentOS6 so launch on swallowtail or petaltail and use all queues but not hp12. Ever since Matlab R2008a, I've been copying the “lsf” integration files forward from release to release to get parallel job submissions working for Lava (v1.x). Every upgrade I ask for Lava integration support instead of having to use generic integration files.

Well, that era came to an end with R2015a. Got it to work but there are some changes.

First the admin side of things: (there has to a better way)

cd /share/apps/CENTOS6/matlab/2015a/toolbox/distcomp/examples/
# read the READMEs on the way down
cd integration/
cd lsf/
cd shared/
cp -p * /share/apps/CENTOS6/matlab/2015a/toolbox/local/

You can of course write a submit script (see below) and submit it to the scheduler, but with the Distributed Computing Server you can also do it from within Matlab.

Within Matlab

# put this in a script myJob.m, then call it from within Matlab or type it all in,  or set up a profile

% distributed matlab jobs
% start 'matlab -nodisplay', issue the command 'myJob'

% set up the scheduler and matlab worker environment
cluster = parallel.cluster.Generic('JobStorageLocation', '/home/hmeij/matlab');
set(cluster, 'HasSharedFilesystem', true);
set(cluster, 'ClusterMatlabRoot', '/share/apps/CENTOS6/matlab/2015a');
set(cluster, 'OperatingSystem', 'unix');

set(cluster, 'IndependentSubmitFcn', @independentSubmitFcn);
% If you want to run communicating jobs (including parallel pools), you must specify a CommunicatingSubmitFcn
set(cluster, 'CommunicatingSubmitFcn', @communicatingSubmitFcn);
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);

% create job and assign tasks to be done
j = createJob(cluster);
createTask(j, @tenfunction, 1, {'log.job1',1,2})
createTask(j, @tenfunction, 1, {'log.job2',3,4})
createTask(j, @tenfunction, 1, {'log.job3',5,6})

% submit job and gather scheduled info
submit(j)
get(cluster)

% you can now exit matlab
% at system prompt type 'bjobs'
[hmeij@swallowtail matlab]$ matlab -nodisplay

                                                     < M A T L A B (R) >
                                           Copyright 1984-2015 The MathWorks, Inc.
                                           R2015a (8.5.0.197613) 64-bit (glnxa64) 
                                                      February 12, 2015           

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.             
                                                              

        Academic License

>> myJob

ans = 

 Task with properties: 

                   ID: 1
                State: pending
             Function: @tenfunction
               Parent: Job 1       
            StartTime:             
     Running Duration: 0 days 0h 0m 0s

      ErrorIdentifier: 
         ErrorMessage: 


ans = 

 Task with properties: 

                   ID: 2
                State: pending
             Function: @tenfunction
               Parent: Job 1       
            StartTime:             
     Running Duration: 0 days 0h 0m 0s

      ErrorIdentifier: 
         ErrorMessage: 


ans =
...snip...


ans =

                IndependentSubmitFcn: @independentSubmitFcn
              CommunicatingSubmitFcn: @communicatingSubmitFcn
                      GetJobStateFcn: @getJobStateFcn
                        CancelJobFcn: []
                       CancelTaskFcn: []
                        DeleteJobFcn: @deleteJobFcn
                       DeleteTaskFcn: []
                                Host: 'swallowtail'
                 HasSharedFilesystem: 1
    RequiresMathWorksHostedLicensing: 0
                  JobStorageLocation: '/home/hmeij/matlab'
                   ClusterMatlabRoot: '/share/apps/CENTOS6/matlab/2015a'
                       LicenseNumber: ''
                     OperatingSystem: 'unix'
                          NumWorkers: Inf
                                Type: 'Generic'
                             Profile: ''
                            UserData: []
                                Jobs: [1x1 parallel.job.CJSIndependentJob]
                            Modified: 1

>>
>> quit

@swallowtail matlab]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
352355  hmeij   RUN   bss24      swallowtail b49         Job1.1     Mar 13 15:09
352356  hmeij   RUN   bss24      swallowtail b49         Job1.2     Mar 13 15:09
352357  hmeij   RUN   bss24      swallowtail b48         Job1.3     Mar 13 15:09

Integrated with OpenLava 2.2

Using bsub

Once this is working you can also do it via a script ofcourse. Build a simple script run.serial

#!/bin/bash
# submit via 'bsub < run.serial'

#BSUB -q matlab
#BSUB -J test
#BSUB -o out
#BSUB -e err

/home/apps/CENTOS6/matlab/2015a/bin/matlab -nodisplay < ./myJob2015.m  > /dev/null
# this is the main matlab invocation which will launch the workers
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME                        
353128  hmeij   RUN   matlab     swallowtail n33         test       Mar 18 14:57                       

[hmeij@swallowtail matlab]$ ssh n33 top -u hmeij -b -n 1
unloading gcc module                                    
top - 14:58:20 up 75 days,  2:29,  1 user,  load average: 23.44, 23.27, 23.56
Tasks: 816 total,  17 running, 799 sleeping,   0 stopped,   0 zombie         
Cpu(s): 59.7%us,  2.3%sy,  0.0%ni, 38.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264635888k total, 156440960k used, 108194928k free,   221504k buffers   
Swap: 31999992k total,    24716k used, 31975276k free, 152396920k cached      

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
18809 hmeij     20   0 2844m 270m  99m S 112.6  0.1   0:11.76 MATLAB            
19021 hmeij     20   0 15560 1700  832 R  5.6  0.0   0:00.08 top                
18798 hmeij     20   0 20596 2132 1020 S  0.0  0.0   0:00.01 res                
18805 hmeij     20   0  103m 1280 1092 S  0.0  0.0   0:00.00 1426705068.3531    
18808 hmeij     20   0  103m 1180 1004 S  0.0  0.0   0:00.00 1426705068.3531    
19020 hmeij     20   0  107m 1872  868 S  0.0  0.0   0:00.00 sshd               

# and the workers appear, note the FROM_HOST

[hmeij@swallowtail matlab]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
353129  hmeij   RUN   bss24      n33         b49         Job2.1     Mar 18 14:58
353130  hmeij   RUN   bss24      n33         b49         Job2.2     Mar 18 14:58
353131  hmeij   RUN   bss24      n33         b48         Job2.3     Mar 18 14:58

Old pages, might be helpful…


Back

cluster/137.1560358698.txt.gz · Last modified: 2019/06/12 12:58 by hmeij07