This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
cluster:124 [2013/10/31 18:37] hmeij |
cluster:124 [2013/10/31 18:44] hmeij |
||
---|---|---|---|
Line 137: | Line 137: | ||
* One to invoke '' | * One to invoke '' | ||
* One to invoke '' | * One to invoke '' | ||
- | * | + | * For a restart we need tow things |
+ | * Create a link from old working directory to new working directory (saved in the pwd text file) | ||
+ | * And edit the script and change the comment blocks and edit the process_id | ||
+ | * The restart job may end up on another node but will same process_id | ||
+ | |||
+ | After you have restarted, you can observe the tool starting from the checkpoint file you are pointing to. To simulate a crash, while your first submission is running with '' | ||
+ | |||
+ | It would be ever sweeter if the scheduler could be told to do all the checkpointing at intervals. | ||