Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:190 [DokuWiki]

User Tools

Site Tools


cluster:190

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
cluster:190 [2020/01/21 14:37]
hmeij07
cluster:190 [2020/01/21 14:46]
hmeij07
Line 19: Line 19:
 mkdir -p /sanscratch/111 /snascratch/checkpoints/111 mkdir -p /sanscratch/111 /snascratch/checkpoints/111
 cd /sanscratch/111 cd /sanscratch/111
 +
 +# invoke sample rpogram which generates one line of output
 time ./a.out time ./a.out
  
Line 32: Line 34:
 [hmeij@cottontail2 111]$ cat fid.txt [hmeij@cottontail2 111]$ cat fid.txt
   0.999578495067887       7.978578277139264E-005   0.999578495067887       7.978578277139264E-005
 +
 +# for example, with 24 hours checkpoint interval
 +# launch new coordinator on random port, log port, 24 hour checkpoints 
 +# make sure you create destination dir for checkpoints
 +dmtcp_launch --new-coordinator \
 +  --coord-port 0 --port-file port.txt --interval 86400 \
 +  --ckptdir /sanscratch/checkpoints/111 \
 +  time ./a.out
 +
      
-# run command below with interval 300 (every 5 mins)+# run command above  with interval 300 (every 5 mins) 
 [hmeij@cottontail2 111]$ ps [hmeij@cottontail2 111]$ ps
   PID TTY          TIME CMD   PID TTY          TIME CMD
Line 45: Line 57:
 1 S hmeij    20008      0  80   0 -  4665 ep_pol 07:46 pts/1    00:00:00 \\ 1 S hmeij    20008      0  80   0 -  4665 ep_pol 07:46 pts/1    00:00:00 \\
 /usr/bin/dmtcp_coordinator --quiet --exit-on-last --daemon /usr/bin/dmtcp_coordinator --quiet --exit-on-last --daemon
 +
 # the random port (in case somebody else is also checkpointing on this host # the random port (in case somebody else is also checkpointing on this host
 [hmeij@cottontail2 111]$ cat port.txt [hmeij@cottontail2 111]$ cat port.txt
Line 55: Line 68:
 -rwxr--r-- 1 hmeij its   12440 Jan 16 13:53 dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\ -rwxr--r-- 1 hmeij its   12440 Jan 16 13:53 dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\
 lrwxrwxrwx 1 hmeij its      60 Jan 16 13:53 dmtcp_restart_script.sh -> dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\ lrwxrwxrwx 1 hmeij its      60 Jan 16 13:53 dmtcp_restart_script.sh -> dmtcp_restart_script_24945f6ae3823bbf-40000-fb2d11ea62fa0.sh\\
 +
 [hmeij@cottontail2 111]$ [hmeij@cottontail2 111]$
 real    63m57.287s real    63m57.287s
Line 65: Line 79:
 -i 300 --ckptdir /sanscratch/checkpoints/111  ./a.out & -i 300 --ckptdir /sanscratch/checkpoints/111  ./a.out &
 [1] 29201 [1] 29201
 +
 [hmeij@cottontail2 111]$ ps [hmeij@cottontail2 111]$ ps
   PID TTY          TIME CMD   PID TTY          TIME CMD
Line 72: Line 87:
 29210 pts/1    00:00:00 dmtcp_coordinat 29210 pts/1    00:00:00 dmtcp_coordinat
 29212 pts/1    00:00:00 ps 29212 pts/1    00:00:00 ps
 +
 +# terminate half way through
 [hmeij@cottontail2 111]$ sleep 32m; kill -9 29202 29210 [hmeij@cottontail2 111]$ sleep 32m; kill -9 29202 29210
  
Line 77: Line 94:
 cd /sanscratch/checkpoints/111 cd /sanscratch/checkpoints/111
 ./dmtcp_restart_script.sh ./dmtcp_restart_script.sh
 +
 # ps  # ps 
 0 S hmeij    20891 20890  -bash 0 S hmeij    20891 20890  -bash
Line 90: Line 108:
  
  
-launch new coordinator on random port, log port, 24 hour checkpoints  +The process will pick up from last checkpoint 
-make sure you create destination dir for checkpoints+and write output to original work directory 
  
-dmtcp_launch --new-coordinator \ 
-  --coord-port 0 --port-file port.txt --interval 86400 \ 
-  --ckptdir /sanscratch/checkpoints/111 \ 
-  time ./a.out 
  
  
cluster/190.txt · Last modified: 2020/09/28 07:38 by hmeij07