This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:124 [2013/10/31 18:37] hmeij |
cluster:124 [2013/10/31 18:53] hmeij |
||
---|---|---|---|
Line 137: | Line 137: | ||
* One to invoke '' | * One to invoke '' | ||
* One to invoke '' | * One to invoke '' | ||
- | * | + | * For a restart we need tow things |
+ | * Create a link from old working directory to new working directory (saved in the pwd text file) | ||
+ | * And edit the script and change the comment blocks and edit the process_id | ||
+ | * The restart job may end up on another node but will same process_id | ||
+ | |||
+ | After you have restarted, you can observe the tool starting from the checkpoint file you are pointing to. To simulate a crash, while your first submission is running with '' | ||
+ | |||
+ | It would be ever sweeter if the scheduler could be told to do all the checkpointing at intervals. | ||
Line 147: | Line 154: | ||
# submit via 'bsub < run.serial' | # submit via 'bsub < run.serial' | ||
rm -f *err *out *shell | rm -f *err *out *shell | ||
- | #BSUB -q mw256chkpnt | + | #BSUB -q mw256 |
#BSUB -n 1 | #BSUB -n 1 | ||
#BSUB -J test | #BSUB -J test |