DokuWiki

Scratch Spaces

We have different locations for scratch space. Some local to the nodes, some mounted across the network. Here is the current setup as of August 2019.

/localscratch
- Local to each node, different sizes roughly around 50-80 GB
- Warning: on nodes n46-n59 there is no hard disk but a SataDOM (usb device plugged directly into system board, 16 GB in size, holds just the OS). Do not use /localscratch on these nodes.

/sanscratch
- 55 TB file system mounted IpoIB using NFS or plain Ethernet
  - greentail52 is the file server
  - /sanscratch/username/ can be used for staging (this is not backed up!)
  - /sanscratch/checkpoints/JOBPID is for checkpoint files (you need to create this in your job)

/localscratch5tb
- 5 TB file system provided by local drives (3x2TB, Raid 0) on each node in the mw256fd queue
- The list of nodes done: n38-n45, all are done (10sep15)

/localscratch
- 2 TB file system on nodes in queue mw128 (n60-n77)

/localscratch
- ~800GB file system on nodes in queue exx96 (n79-n90) on SSD NVMe

48 TB of local scratch space will be made available in 6 TB chunks on the nodes in the queue mw256fd. That yields 5TB of local scratch space per node using Raid 0 and file type ext4, mounted at /localscratch5tb. Everybody may use this but it has specifically been put in place for Gaussian jobs yielding massive RWF files (application scratch files).

Note: Everybody is welcome to store content in /localscratch5tb/username/ for easy job access of large data files unless it interferes with jobs. However be warned that a) it's local storage, b) it's raid 0 (one disk failures and all data is lost), c) it's like /tmp read and write permission for all (do chmod go-rwx /localscratch5tb/username for some protection, and d) this file system is not backed up. In addition, /sanscratch/username/ will also be allowed.

You need to change your working directory to the location the scheduler has made for you. Also save your output before the job terminates, the scheduler will remove that working directory. Here is the workflow…

#!/bin/bash
# submit like so: bsub < run.forked

# if writing large checkpoint files uncomment next lines
#ionice -c 2 -n 7 -p $$
#ionice -p $$

#BSUB -q mw256fd
#BSUB -o out
#BSUB -e err
#BSUB -J test

# job slots: match inside gaussian.com
#BSUB -n 4
# force all onto one host (shared code and data stack)
#BSUB -R "span[hosts=1]"

# unique job scratch dirs
MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
MYLOCALSCRATCH5TB=/localscratch5tb/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH MYLOCALSCRATCH5TB

# cd to remote working directory
cd $MYLOCALSCRATCH5TB
pwd

# environment
export GAUSS_SCRDIR="$MYLOCALSCRATCH5TB"

export g09root="/share/apps/gaussian/g09root"
. $g09root/g09/bsd/g09.profile

#export gdvroot="/share/apps/gaussian/gdvh11"
#. $gdvroot/gdv/bsd/gdv.profile

# stage input data to localscratch5tb
cp ~/jobs/forked/gaussian.com .
touch gaussian.log

# run plain vanilla
g09 < gaussian.com > gaussian.log

# run dev
#gdv < gaussian.com > gaussian.log

# save results back to homedir !!!
cp gaussian.log ~/jobs/forked/output.$LSB_JOBID

Back