User Tools

Site Tools


cluster:103

This is an old revision of the document!


Table of Contents


Back

Some general information for SAS users.

SAS

SAS, the statistical analysis software (http://sas.com), and much more, frequently used in the social sciences, is available on the High Performance Academic Computing Cluster. It is not a parallel version of SAS, but we do offer an unlimited campus wide license for Teaching and Research.

SAS is typically invoked in batch mode by submitting a script (*.sas text file). SAS will generate a log file (*.log) and a listing file (*.lst). The former shows you what is going on, the latter contains the output of invoked procedures.

SAS can be invoked in interactive mode on the head node for debugging and code development if needed. However, this is not supported on compute nodes. Hence if you need to generate graphical output you will have to use SAS/Graphics or the Output Delivery System (SAS/ODS). Examples of code can be found at a variety of locations:

Program

So lets generate a little SAS program using a Unix editor like vi/vim, emacs or pico.

  • First we generate the input data file test.dat
1234567890
0987654321
2468097531
  • Next a simple SAS file test.sas which does the obvious
options nocenter;
filename test './test.dat';

data one;
  infile test;
  input @2 x 3.1 @6 y 3.1;
  total = x * y;
run;

proc print; run;
  • Lets test it by submitting on head node
[root@greentail sas]# ll
total 8
-rw-r--r-- 1 root root  33 Dec 21 10:16 test.dat
-rw-r--r-- 1 root root 140 Dec 21 10:22 test.sas
[root@greentail sas]# sas test
[root@greentail sas]# cat test.lst
The SAS System   10:24 Wednesday, December 21, 2011   1

Obs      x       y      total

 1     23.4    67.8    1586.52
 2     98.7    54.3    5359.41
 3     46.8    97.5    4563.00

Submit

Ok, so we have a program that works. Now we want to submit a dozen of them. In order to do that we will write a script that invokes this SAS program and hand it off to the scheduler (Lava). The scheduler will figure out which compute nodes are idle and submit your program on your behalf.

  • Create a shell script run for submission
  • Set execute permissions chmod u+x run
  • Submit (see below)
#!/bin/bash
# submit via 'bsub < run'

#BSUB -q hp12
#BSUB -J test
#BSUB -o stdout
#BSUB -e stderr

time sas test

The leading '#' is a comment in shell scripting but the scheduler specifically looks ofr leading '#BSUB' tags and interprets the line: -q (define queue), -J (job name), -o save STDOUT to a filename, -e same for STDERR. Then the job is defined as to what to run, here we prefix it with the unix utility time which reports run time to STDERR.

[hmeij@greentail sas]$ bsub < run
Job <492637> is submitted to queue <hp12>.

[hmeij@greentail sas]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
492637  hmeij   RUN   hp12       greentail   n10         test       Dec 21 10:49

[hmeij@greentail sas]$ bqueues
QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP 
hp12             50   Open:Active    256    -    -    -   219     0   219     0
matlab           50   Open:Active      8    8    -    8     0     0     0     0
stata            50   Open:Active      6    6    -    6     0     0     0     0
elw              50   Open:Active     60    -    -    -     0     0     0     0
emw              50   Open:Active     32    -    -    -     8     0     8     0
ehw              50   Open:Active     32    -    -    -     8     0     8     0
ehwfd            50   Open:Active     32    -    -    -     8     0     8     0
imw              50   Open:Active    128    -    -    -    32     0    32     0
bss24            50   Open:Active     90    -    -    -     0     0     0     0

[hmeij@greentail sas]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
492637  hmeij   RUN   hp12       greentail   n10         test       Dec 21 10:49
[hmeij@greentail sas]$ bjobs
No unfinished job found

[hmeij@greentail sas]$ ll
total 28
-rwxr--r-- 1 hmeij its  115 Dec 21 10:48 run
-rw-r--r-- 1 hmeij its   42 Dec 21 10:49 stderr
-rw-r--r-- 1 hmeij its  838 Dec 21 10:49 stdout
-rw-r--r-- 1 hmeij its   33 Dec 21 10:16 test.dat
-rw-r--r-- 1 hmeij its 2565 Dec 21 10:49 test.log
-rw-r--r-- 1 hmeij its  258 Dec 21 10:49 test.lst
-rw-r--r-- 1 hmeij its  140 Dec 21 10:22 test.sas

And so the job was dispatched to host n10 for execution. Results are posted in my home directory, in fact the entire job ran in my home directory on the remote compute node. I may not want to do that if I process or generate a lot of data. So we're going to add some statements to the script. Also, I may want to reserve some memory so the scheduler does not submit the job to hosts that have insufficient memory available or some other job is dispatched later that causes memory conflicts.

The hp12 is the greentail HP cluster where each compute node has 12 GB memory footprint. Memory footprints for the other queues differ, please consult this link (there is some old data…) http://petaltail.wesleyan.edu/cgi-bin/bqueues_web.cgi


Back

cluster/103.1324493414.txt.gz · Last modified: 2011/12/21 13:50 by hmeij