This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
cluster:190 [2020/01/21 14:37] hmeij07 |
cluster:190 [2020/01/21 14:46] hmeij07 |
||
---|---|---|---|
Line 19: | Line 19: | ||
mkdir -p / | mkdir -p / | ||
cd / | cd / | ||
+ | |||
+ | # invoke sample rpogram which generates one line of output | ||
time ./a.out | time ./a.out | ||
Line 32: | Line 34: | ||
[hmeij@cottontail2 111]$ cat fid.txt | [hmeij@cottontail2 111]$ cat fid.txt | ||
0.999578495067887 | 0.999578495067887 | ||
+ | |||
+ | # for example, with 24 hours checkpoint interval | ||
+ | # launch new coordinator on random port, log port, 24 hour checkpoints | ||
+ | # make sure you create destination dir for checkpoints | ||
+ | dmtcp_launch --new-coordinator \ | ||
+ | --coord-port 0 --port-file port.txt --interval 86400 \ | ||
+ | --ckptdir / | ||
+ | time ./a.out | ||
+ | |||
| | ||
- | # run command | + | # run command |
[hmeij@cottontail2 111]$ ps | [hmeij@cottontail2 111]$ ps | ||
PID TTY TIME CMD | PID TTY TIME CMD | ||
Line 45: | Line 57: | ||
1 S hmeij 20008 | 1 S hmeij 20008 | ||
/ | / | ||
+ | |||
# the random port (in case somebody else is also checkpointing on this host | # the random port (in case somebody else is also checkpointing on this host | ||
[hmeij@cottontail2 111]$ cat port.txt | [hmeij@cottontail2 111]$ cat port.txt | ||
Line 55: | Line 68: | ||
-rwxr--r-- 1 hmeij its 12440 Jan 16 13:53 dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\ | -rwxr--r-- 1 hmeij its 12440 Jan 16 13:53 dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\ | ||
lrwxrwxrwx 1 hmeij its 60 Jan 16 13:53 dmtcp_restart_script.sh -> dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\ | lrwxrwxrwx 1 hmeij its 60 Jan 16 13:53 dmtcp_restart_script.sh -> dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\ | ||
+ | |||
[hmeij@cottontail2 111]$ | [hmeij@cottontail2 111]$ | ||
real 63m57.287s | real 63m57.287s | ||
Line 65: | Line 79: | ||
-i 300 --ckptdir / | -i 300 --ckptdir / | ||
[1] 29201 | [1] 29201 | ||
+ | |||
[hmeij@cottontail2 111]$ ps | [hmeij@cottontail2 111]$ ps | ||
PID TTY TIME CMD | PID TTY TIME CMD | ||
Line 72: | Line 87: | ||
29210 pts/1 00:00:00 dmtcp_coordinat | 29210 pts/1 00:00:00 dmtcp_coordinat | ||
29212 pts/1 00:00:00 ps | 29212 pts/1 00:00:00 ps | ||
+ | |||
+ | # terminate half way through | ||
[hmeij@cottontail2 111]$ sleep 32m; kill -9 29202 29210 | [hmeij@cottontail2 111]$ sleep 32m; kill -9 29202 29210 | ||
Line 77: | Line 94: | ||
cd / | cd / | ||
./ | ./ | ||
+ | |||
# ps | # ps | ||
0 S hmeij 20891 20890 -bash | 0 S hmeij 20891 20890 -bash | ||
Line 90: | Line 108: | ||
- | # launch new coordinator on random port, log port, 24 hour checkpoints | + | # The process will pick up from last checkpoint |
- | # make sure you create destination dir for checkpoints | + | # and write output to original work directory |
- | dmtcp_launch --new-coordinator \ | ||
- | --coord-port 0 --port-file port.txt --interval 86400 \ | ||
- | --ckptdir / | ||
- | time ./a.out | ||