User Tools

Site Tools


cluster:142

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cluster:142 [2015/08/03 14:31]
hmeij created
cluster:142 [2020/02/27 08:59] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:​0|Back]]** **[[cluster:​0|Back]]**
 + 
 ===== Scratch Spaces ===== ===== Scratch Spaces =====
  
-We have different ... blahblah, to come+We have different ​locations for scratch spaceSome local to the nodes, some mounted across the networkHere is the current setup as of August 2019. 
 + 
 +  * **/​localscratch** 
 +    * Local to each nodedifferent sizes roughly around 50-80 GB 
 +    * Warning: on nodes n46-n59 there is no hard disk but a SataDOM (usb device plugged directly into system board16 GB in size, holds just the OS). Do not use /​localscratch on these nodes. 
 + 
 +  * **/​sanscratch**  
 +    * 55 TB file system mounted IpoIB using NFS or plain Ethernet 
 +      * greentail52 is the file server 
 +      * /​sanscratch/​username/​ can be used for staging (this is not backed up!) 
 +      * /​sanscratch/​checkpoints/​JOBPID is for checkpoint files (you need to create this in your job) 
 + 
 + 
 +  * **/​localscratch5tb** 
 +    * 5 TB file system provided by local drives (3x2TB, Raid 0) on each node in the ''​mw256fd''​ queue 
 +    * The list of nodes done: n38-n45, all are done (10sep15) 
 + 
 +  * **/​localscratch** 
 +    * 2 TB file system on nodes in queue ''​mw128''​ (n60-n77) 
 + 
 +  * **/​localscratch** 
 +    * ~800GB file system on nodes in queue ''​exx96''​ (n79-n90) on SSD NVMe 
 + 
 + 
 +48 TB of local scratch space will be made available in 6 TB chunks on the nodes in the queue ''​mw256fd''​. That yields 5TB of local scratch space per node using Raid 0 and file type ''​ext4'',​ mounted at /​localscratch5tb. Everybody may use this but it has specifically been put in place for Gaussian jobs yielding massive RWF files (application scratch files). 
 + 
 +**Note: Everybody is welcome to store content in ''/​localscratch5tb/​username/''​ for easy job access of large data files unless it interferes with jobs. However be warned that a) it's local storage, b) it's raid 0 (one disk failures and all data is lost), c) it's like /tmp read and write permission for all (do ''​chmod go-rwx /​localscratch5tb/​username''​ for some protection, and d) this file system is not backed up. In addition, ''/​sanscratch/​username/''​ will also be allowed.** 
 +  
 + 
 +You need to change your working directory to the location the scheduler has made for you. Also save your output before the job terminates, the scheduler will remove that working directory. Here is the workflow... 
 + 
 +<​code>​ 
 + 
 +#​!/​bin/​bash 
 +# submit like so: bsub < run.forked 
 + 
 +# if writing large checkpoint files uncomment next lines 
 +#ionice -c 2 -n 7 -p $$ 
 +#ionice -p $$ 
 + 
 +#BSUB -q mw256fd 
 +#BSUB -o out 
 +#BSUB -e err 
 +#BSUB -J test 
 + 
 +# job slots: match inside gaussian.com 
 +#BSUB -n 4 
 +# force all onto one host (shared code and data stack) 
 +#BSUB -R "​span[hosts=1]"​ 
 + 
 +# unique job scratch dirs 
 +MYSANSCRATCH=/​sanscratch/​$LSB_JOBID 
 +MYLOCALSCRATCH=/​localscratch/​$LSB_JOBID 
 +MYLOCALSCRATCH5TB=/​localscratch5tb/​$LSB_JOBID 
 +export MYSANSCRATCH MYLOCALSCRATCH MYLOCALSCRATCH5TB 
 + 
 +# cd to remote working directory 
 +cd $MYLOCALSCRATCH5TB 
 +pwd 
 + 
 +# environment 
 +export GAUSS_SCRDIR="​$MYLOCALSCRATCH5TB"​ 
 + 
 +export g09root="/​share/​apps/​gaussian/​g09root"​ 
 +. $g09root/​g09/​bsd/​g09.profile 
 + 
 +#export gdvroot="/​share/​apps/​gaussian/​gdvh11"​ 
 +#. $gdvroot/​gdv/​bsd/​gdv.profile 
 + 
 +# stage input data to localscratch5tb 
 +cp ~/​jobs/​forked/​gaussian.com . 
 +touch gaussian.log 
 + 
 +# run plain vanilla 
 +g09 < gaussian.com > gaussian.log 
 + 
 +# run dev 
 +#gdv < gaussian.com > gaussian.log 
 + 
 +# save results back to homedir !!! 
 +cp gaussian.log ~/​jobs/​forked/​output.$LSB_JOBID 
 + 
 +</​code>​
  
  
cluster/142.1438626674.txt.gz · Last modified: 2015/08/03 14:31 by hmeij