This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:147 [2017/05/24 18:32] hmeij07 [Files v0.2] |
cluster:147 [2020/02/27 18:06] (current) hmeij07 |
||
---|---|---|---|
Line 3: | Line 3: | ||
==== BLCR Checkpoint in OL3 ==== | ==== BLCR Checkpoint in OL3 ==== | ||
+ | |||
+ | **Deprecated since we did OS upgrades [[cluster: | ||
+ | We will install DMTCP as a replacement...[[cluster: | ||
+ | --- // | ||
* This page concerns SERIAL jobs only; SERIAL jobs can restart on any node | * This page concerns SERIAL jobs only; SERIAL jobs can restart on any node | ||
Line 10: | Line 14: | ||
* Users Guide [[https:// | * Users Guide [[https:// | ||
- | When we move to Openlava 3.x all queues will support checkpointing, | + | All queues will support checkpointing, |
- | Checkpointing is an expensive operation so do not checkpoint under 6 hours. For example, if your job runs for a month checkpoint once a day, if your job runs for a week checkpoint every 12 hours. From this point on I expect all users to checkpoint. Some software does this internally (Amber, Gaussian). For applications or home grown code you can use BLCR. (Too bad it does not work out of box within Openlava). | + | Checkpointing is an expensive operation so do not checkpoint under 6 hours. For example, if your job runs for a month checkpoint once a day, if your job runs for a week checkpoint every 12 hours. From this point on I expect all users to checkpoint. Some software does this internally (Amber, Gaussian). For applications or home grown code you can use BLCR. |
You need to test out checkpointing before you rely on it. I've notice that some local code, when opening files for output, BLCR does not notice it. The code below has such an example (file fid.txt). Hopefully future versions of BLCR will fix this. Or maybe we should open files differently, | You need to test out checkpointing before you rely on it. I've notice that some local code, when opening files for output, BLCR does not notice it. The code below has such an example (file fid.txt). Hopefully future versions of BLCR will fix this. Or maybe we should open files differently, | ||
- | BLCR, Berkely Lab Checkpoint and Restart, remembers file paths and process ids. The code stages the necessary STDOUT and STDERR files Openlava | + | BLCR, Berkely Lab Checkpoint and Restart, remembers file paths and process ids. The code stages the necessary STDOUT and STDERR files scheduler |
At the bottom of this page is the current version of '' | At the bottom of this page is the current version of '' | ||
Line 274: | Line 278: | ||
fi | fi | ||
done | done | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
==== Matlab ==== | ==== Matlab ==== | ||
Line 279: | Line 287: | ||
* https:// | * https:// | ||
- | </ | ||