This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:103 [2011/12/21 15:11] hmeij [SAS] |
cluster:103 [2011/12/22 19:34] (current) hmeij [Submit 2] |
||
---|---|---|---|
Line 6: | Line 6: | ||
==== SAS ==== | ==== SAS ==== | ||
- | SAS, the statistical software, and much more, frequently used in the social sciences, is available on the High Performance Academic Computing Cluster. | + | SAS, the statistical |
SAS is typically invoked in batch mode by submitting a script (*.sas text file). | SAS is typically invoked in batch mode by submitting a script (*.sas text file). | ||
- | SAS can be invoked in interactive mode on the head node for debugging and code development if needed. | + | SAS can be invoked in interactive mode on the head node for debugging and code development, if needed. |
* at sas.com [[http:// | * at sas.com [[http:// | ||
* SAS/ODS examples [[http:// | * SAS/ODS examples [[http:// | ||
* SAS/ | * SAS/ | ||
- | * | + | |
+ | A tutor application is available at [[http:// | ||
+ | |||
+ | ==== Program ==== | ||
+ | |||
+ | So lets generate a little SAS program using a Unix editor like vi/vim, emacs or pico. | ||
+ | |||
+ | * First we generate the input data file '' | ||
+ | |||
+ | < | ||
+ | 1234567890 | ||
+ | 0987654321 | ||
+ | 2468097531 | ||
+ | </ | ||
+ | |||
+ | * Next a simple SAS file '' | ||
+ | |||
+ | < | ||
+ | options nocenter; | ||
+ | filename test ' | ||
+ | |||
+ | data one; | ||
+ | infile test; | ||
+ | input @2 x 3.1 @6 y 3.1; | ||
+ | total = x * y; | ||
+ | run; | ||
+ | |||
+ | proc print; run; | ||
+ | </ | ||
+ | |||
+ | * Lets test it by submitting on head node | ||
+ | |||
+ | < | ||
+ | [root@greentail sas]# ll | ||
+ | total 8 | ||
+ | -rw-r--r-- 1 root root 33 Dec 21 10:16 test.dat | ||
+ | -rw-r--r-- 1 root root 140 Dec 21 10:22 test.sas | ||
+ | [root@greentail sas]# sas test | ||
+ | [root@greentail sas]# cat test.lst | ||
+ | The SAS System | ||
+ | |||
+ | Obs x | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Submit ==== | ||
+ | |||
+ | Ok, so now we have a program that works. | ||
+ | |||
+ | * Create a shell script for example with the name '' | ||
+ | * Set execute permissions '' | ||
+ | * Submit (see below) | ||
+ | |||
+ | < | ||
+ | |||
+ | # | ||
+ | # submit via 'bsub < run' | ||
+ | |||
+ | #BSUB -q hp12 | ||
+ | #BSUB -J test | ||
+ | #BSUB -o stdout | ||
+ | #BSUB -e stderr | ||
+ | |||
+ | time sas test | ||
+ | |||
+ | </ | ||
+ | |||
+ | The leading '#' | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@greentail sas]$ bsub < run | ||
+ | Job < | ||
+ | |||
+ | [hmeij@greentail sas]$ bjobs | ||
+ | JOBID | ||
+ | 492637 | ||
+ | |||
+ | [hmeij@greentail sas]$ bqueues | ||
+ | QUEUE_NAME | ||
+ | hp12 | ||
+ | matlab | ||
+ | stata 50 | ||
+ | elw 50 | ||
+ | emw 50 | ||
+ | ehw 50 | ||
+ | ehwfd 50 | ||
+ | imw 50 | ||
+ | bss24 50 | ||
+ | |||
+ | [hmeij@greentail sas]$ bjobs | ||
+ | JOBID | ||
+ | 492637 | ||
+ | [hmeij@greentail sas]$ bjobs | ||
+ | No unfinished job found | ||
+ | |||
+ | [hmeij@greentail sas]$ ll | ||
+ | total 28 | ||
+ | -rwxr--r-- 1 hmeij its 115 Dec 21 10:48 run | ||
+ | -rw-r--r-- 1 hmeij its 42 Dec 21 10:49 stderr | ||
+ | -rw-r--r-- 1 hmeij its 838 Dec 21 10:49 stdout | ||
+ | -rw-r--r-- 1 hmeij its 33 Dec 21 10:16 test.dat | ||
+ | -rw-r--r-- 1 hmeij its 2565 Dec 21 10:49 test.log | ||
+ | -rw-r--r-- 1 hmeij its 258 Dec 21 10:49 test.lst | ||
+ | -rw-r--r-- 1 hmeij its 140 Dec 21 10:22 test.sas | ||
+ | </ | ||
+ | |||
+ | And so the job was dispatched to host '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ==== Submit 2 ==== | ||
+ | |||
+ | On the back end compute nodes, unless specified, the job runs inside your home directory. | ||
+ | |||
+ | In the SAS program we add the following lines | ||
+ | |||
+ | < | ||
+ | %let jobpid = %sysget(LSB_JOBID); | ||
+ | libname here "/ | ||
+ | </ | ||
+ | |||
+ | And change this line to use local disks for storage | ||
+ | |||
+ | < | ||
+ | data here.one; | ||
+ | </ | ||
+ | |||
+ | In the submission script we change the following | ||
+ | |||
+ | * new submission file with edits | ||
+ | * -n implies reserve job slots (cpu cores) for job (not necesssary, SAS jobs will always use only one) | ||
+ | * -R reserves memory, for example, reserve 200 MB of memory on target compute node | ||
+ | * scheduler creates unique dirs in scratch by JOBPID for you, so we'll stage the job there | ||
+ | * but now we must copy relevant files //to// scratch dir and results back //to// home dir | ||
+ | |||
+ | |||
+ | < | ||
+ | |||
+ | # | ||
+ | # submit via 'bsub < run' | ||
+ | |||
+ | #BSUB -q hp12 | ||
+ | #BSUB -J test | ||
+ | #BSUB -o stdout | ||
+ | #BSUB -e stderr | ||
+ | #BSUB -n 1 | ||
+ | #BSUB -R " | ||
+ | |||
+ | # unique job dir in scratch | ||
+ | export MYSANSCRATCH=/ | ||
+ | cd $MYSANSCRATCH | ||
+ | |||
+ | cp ~/ | ||
+ | time sas test | ||
+ | cp test.log test.lst ~/sas | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | * you can monitor the progress of your jobs from greentail while it runs | ||
+ | |||
+ | < | ||
+ | [hmeij@greentail sas]$ ll / | ||
+ | total 16 | ||
+ | -rw-r--r-- 1 hmeij its 33 Dec 21 14:31 test.dat | ||
+ | -rw-r--r-- 1 hmeij its 2568 Dec 21 14:31 test.log | ||
+ | -rw-r--r-- 1 hmeij its 258 Dec 21 14:31 test.lst | ||
+ | -rw-r--r-- 1 hmeij its 140 Dec 21 14:31 test.sas | ||
+ | </ | ||
+ | |||
+ | ==== Best Practices ==== | ||
+ | |||
+ | * You may submit as many SAS jobs as you like, just leave enough resources available for others to also get work done | ||
+ | * Because SAS submission are serial, non-parallel jobs your -n flag is always 1 | ||
+ | * Reserve resources if you know what you need, especially memory | ||
+ | * Use /sanscratch for large data jobs with heavy read/write operations | ||
+ | * Queue ehwfd is preferentially for Gaussian users and stay off the stata and matlab queues please | ||
+ | * Write smart SAS code, for example, use data set indexes and PROC SQL (this can be your best friend) | ||
+ | | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |