User Tools

Site Tools


cluster:190

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:190 [2020/01/21 14:46]
hmeij07
cluster:190 [2020/09/28 07:38]
hmeij07
Line 11: Line 11:
  
 This is a replacement for THE BLCR methods we used [[cluster:​147|BLCR Checkpoint in OL3 -serial]] or [[cluster:​148|BLCR Checkpoint in OL3 - parallel]] ... BLCR is not being developed anymore. Today'​s brief power outage removed the BLCR kernel module HPCC wide. So learn DMTCP. I have not provided wrappers but you can follow the same logic as we used with BLCR. This is a replacement for THE BLCR methods we used [[cluster:​147|BLCR Checkpoint in OL3 -serial]] or [[cluster:​148|BLCR Checkpoint in OL3 - parallel]] ... BLCR is not being developed anymore. Today'​s brief power outage removed the BLCR kernel module HPCC wide. So learn DMTCP. I have not provided wrappers but you can follow the same logic as we used with BLCR.
 +
 +Write your checkpoint files in ''/​sanscratch/​checkpoints/​JOBPID''​ so it does not add into your quota. ​ The scheduler will not create this directory for you, you must do this in your submit job.  Directories will automatically be delete if 120 days old.
  
 <​code>​ <​code>​
Line 17: Line 19:
  
 # make a directory (first one done by scheduler, second one done by your script) # make a directory (first one done by scheduler, second one done by your script)
-mkdir -p /​sanscratch/​111 /snascratch/​checkpoints/​111+mkdir -p /​sanscratch/​111 /sanscratch/​checkpoints/​111
 cd /​sanscratch/​111 cd /​sanscratch/​111
  
Line 101: Line 103:
  
 # You must make sure the old directory and file exists, otherwise # You must make sure the old directory and file exists, otherwise
-[40000] ERROR at fileconnection.cpp:​737 in refill; REASON='​JASSERT(jalib::​Filesystem::​FileExists(_path)) failed'​+[40000] ERROR at fileconnection.cpp:​737 in refill; ​ 
 +REASON='​JASSERT(jalib::​Filesystem::​FileExists(_path)) failed'​
      _path = /​sanscratch/​111/​fid.txt      _path = /​sanscratch/​111/​fid.txt
 Message: File not found. Message: File not found.
Line 111: Line 114:
 # and write output to original work directory # and write output to original work directory
  
 +</​code>​
  
- 
- 
-</​code>​ 
 ==== Quick-Start Guide ==== ==== Quick-Start Guide ====
  
cluster/190.txt · Last modified: 2020/09/28 07:38 by hmeij07