User Tools

Site Tools


cluster:124

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
cluster:124 [2013/10/31 18:32]
hmeij [BLCR]
cluster:124 [2013/10/31 18:37]
hmeij
Line 129: Line 129:
 Now we can write a batch script for the scheduler.  We need to do several things Now we can write a batch script for the scheduler.  We need to do several things
  
 +  * The job will always end up in /sanscratch/JOBPID so we need to stage and save our data
 +  * The checkpoint file should be written to a safe place, like /home
 +  * The time interval for checkpointing should be sufficiently large to not slow the job down
 +    * for example set it to 12 hours or 24 hours even
 +    * the small interval times in script is just for testing
 +  * Then there are 2 blocks of line sto (un)comment
 +    * One to invoke ''cr_run''
 +    * One to invoke ''cr_restart''
   *    * 
  
Line 174: Line 182:
 echo "process_id=$process_id" echo "process_id=$process_id"
 while [ $process_id -gt 0 ]; do while [ $process_id -gt 0 ]; do
-        # checkpoint time interval, make it an hour or larger (small for testing)+        # checkpoint time interval, make it very large (small for testing)
         sleep 120         sleep 120
-        # save the checkpoint file outside of sanscratch+        # save the checkpoint file outside of /sanscratch
         cr_checkpoint -f ~/blcr/checkpoint.$process_id $process_id         cr_checkpoint -f ~/blcr/checkpoint.$process_id $process_id
         # if the application has crashed, exit         # if the application has crashed, exit
         process_id=`ps -u hmeij | grep t-20001030-01 | grep -v grep | awk '{print $1}'`         process_id=`ps -u hmeij | grep t-20001030-01 | grep -v grep | awk '{print $1}'`
         if [ "${process_id}x" = 'x' ]; then         if [ "${process_id}x" = 'x' ]; then
-                # save some stuff for checking+                # save some stuff for checking later
                 cp -p pwd* *.shell *.out *.err context ~/blcr/                 cp -p pwd* *.shell *.out *.err context ~/blcr/
                 rm -f `cat ~/blcr/pwd.$process_id`                 rm -f `cat ~/blcr/pwd.$process_id`
cluster/124.txt ยท Last modified: 2016/03/11 20:14 by hmeij07