This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
cluster:148 [2016/03/29 20:00] hmeij07 |
cluster:148 [2016/03/30 18:00] hmeij07 |
||
---|---|---|---|
Line 4: | Line 4: | ||
==== BLCR Checkpoint in OL3 ==== | ==== BLCR Checkpoint in OL3 ==== | ||
- | * This page concerns PARALLEL mpirun jobs only | + | * This page concerns PARALLEL mpirun jobs only; there are some restrictions |
+ | * all MPI threads need to be confined to one node | ||
+ | * restarted jobs must use the same node (not sure why) | ||
* For SERIAL jobs go here [[cluster: | * For SERIAL jobs go here [[cluster: | ||
Line 11: | Line 13: | ||
* Users Guide [[https:// | * Users Guide [[https:// | ||
+ | |||
+ | Checkpointing parallel jobs is a bit more complex than a serial job. MPI jobs are fired off by worker 0 of '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | < | ||
+ | |||
+ | # from eric at lbl | ||
+ | ./configure \ | ||
+ | --enable-ft-thread \ | ||
+ | --with-ft=cr \ | ||
+ | --enable-opal-multi-threads \ | ||
+ | --with-blcr=/ | ||
+ | --without-tm \ | ||
+ | --prefix=/ | ||
+ | |||
+ | # next download cr_mpirun | ||
+ | https:// | ||
+ | |||
+ | # configure and test | ||
+ | |||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | |||
+ | ./configure --with-blcr=/ | ||
+ | |||
+ | ============================================================================ | ||
+ | Testsuite summary for cr_mpirun 295 | ||
+ | ============================================================================ | ||
+ | # TOTAL: 3 | ||
+ | # PASS: 3 | ||
+ | # SKIP: 0 | ||
+ | # XFAIL: 0 | ||
+ | # FAIL: 0 | ||
+ | # XPASS: 0 | ||
+ | # ERROR: 0 | ||
+ | ============================================================================ | ||
+ | make[1]: Leaving directory `/ | ||
+ | |||
+ | # I coped cr_runmpi into / | ||
+ | # cr_runmpi needs access to all these in $PATH | ||
+ | # mpirun cr_mpirun ompi-checkpoint ompi-restart cr_checkpoint cr_restart | ||
+ | |||
+ | # next compile you parallel software using mpicc/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
< | < |