User Tools

Site Tools


cluster:190

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:190 [2020/01/21 14:46]
hmeij07
cluster:190 [2020/09/28 07:38] (current)
hmeij07
Line 11: Line 11:
  
 This is a replacement for THE BLCR methods we used [[cluster:147|BLCR Checkpoint in OL3 -serial]] or [[cluster:148|BLCR Checkpoint in OL3 - parallel]] ... BLCR is not being developed anymore. Today's brief power outage removed the BLCR kernel module HPCC wide. So learn DMTCP. I have not provided wrappers but you can follow the same logic as we used with BLCR. This is a replacement for THE BLCR methods we used [[cluster:147|BLCR Checkpoint in OL3 -serial]] or [[cluster:148|BLCR Checkpoint in OL3 - parallel]] ... BLCR is not being developed anymore. Today's brief power outage removed the BLCR kernel module HPCC wide. So learn DMTCP. I have not provided wrappers but you can follow the same logic as we used with BLCR.
 +
 +Write your checkpoint files in ''/sanscratch/checkpoints/JOBPID'' so it does not add into your quota.  The scheduler will not create this directory for you, you must do this in your submit job.  Directories will automatically be delete if 120 days old.
  
 <code> <code>
Line 17: Line 19:
  
 # make a directory (first one done by scheduler, second one done by your script) # make a directory (first one done by scheduler, second one done by your script)
-mkdir -p /sanscratch/111 /snascratch/checkpoints/111+mkdir -p /sanscratch/111 /sanscratch/checkpoints/111
 cd /sanscratch/111 cd /sanscratch/111
  
Line 101: Line 103:
  
 # You must make sure the old directory and file exists, otherwise # You must make sure the old directory and file exists, otherwise
-[40000] ERROR at fileconnection.cpp:737 in refill; REASON='JASSERT(jalib::Filesystem::FileExists(_path)) failed'+[40000] ERROR at fileconnection.cpp:737 in refill;  
 +REASON='JASSERT(jalib::Filesystem::FileExists(_path)) failed'
      _path = /sanscratch/111/fid.txt      _path = /sanscratch/111/fid.txt
 Message: File not found. Message: File not found.
Line 111: Line 114:
 # and write output to original work directory # and write output to original work directory
  
 +</code>
  
- 
- 
-</code> 
 ==== Quick-Start Guide ==== ==== Quick-Start Guide ====
  
cluster/190.1579635973.txt.gz · Last modified: 2020/01/21 14:46 by hmeij07