Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:189 [DokuWiki]

User Tools

Site Tools


cluster:189

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
cluster:189 [2020/02/27 13:10]
hmeij07 [Funding Policy]
cluster:189 [2020/02/27 13:12]
hmeij07
Line 58: Line 58:
 The second principle grants priority access to certain resource(s) for a limited time to a limited group. The same PI/users relationship will be used as is used in the CPU/GPU Usage Contribution scheme. Priority access specifically means: If during the priority period the priority members' jobs go into pending mode for more than 24 hours the hpcadmin will clear compute nodes of running jobs and force those pending jobs to run. This by now is an automated process via cron that checks every 2 hours. Steps involved are; find priority members' jobs pending for more than 24 hours, find a node with no priority members jobs running in that queue, close target node, requeue all jobs on that node, force pending job(s) to run, wait 5 mins, reopen node. The second principle grants priority access to certain resource(s) for a limited time to a limited group. The same PI/users relationship will be used as is used in the CPU/GPU Usage Contribution scheme. Priority access specifically means: If during the priority period the priority members' jobs go into pending mode for more than 24 hours the hpcadmin will clear compute nodes of running jobs and force those pending jobs to run. This by now is an automated process via cron that checks every 2 hours. Steps involved are; find priority members' jobs pending for more than 24 hours, find a node with no priority members jobs running in that queue, close target node, requeue all jobs on that node, force pending job(s) to run, wait 5 mins, reopen node.
  
-All users should be aware this may happen so please checkpoint your jobs with a checkpoint interval of 24 hours. Please consult  [[cluster:147|BLCR Checkpoint in OL3]] (serial jobs) and [[cluster:148|BLCR Checkpoint in OL3]] (parallel jobs).+All users should be aware this may happen so please checkpoint your jobs with a checkpoint interval of 24 hours. Please consult  [[cluster:190|DMTCP]].
  
 ==== General ==== ==== General ====
cluster/189.txt ยท Last modified: 2024/02/12 11:47 by hmeij07