cluster:198
This is an old revision of the document!
GPU checkpoint/restart
Why I thought this was an easy problem to solve I do not know. CPU checkpoint/restart has come a long way with DMTCP for serial and parallel jobs (including multi-host).
A good overview of the history of GPU checkpoint/restart efforts can be found at this presentation
An excellent in depth explanation can be found in this article
- CRAC git site: https://github.com/DMTCP-CRAC/CRAC-early-development
cluster/198.1606919125.txt.gz · Last modified: by hmeij07
