User Tools

Site Tools


cluster:198

This is an old revision of the document!



Back

GPU checkpoint/restart

Why I thought this was an easy problem to solve I do not know. CPU checkpoint/restart has come a long way with DMTCP for serial and parallel jobs (including multi-host).

A good overview of the history of GPU checkpoint/restart efforts can be found at this presentation

An excellent in depth explanation can be found in this article


Back

cluster/198.1606919125.txt.gz · Last modified: 2020/12/02 09:25 by hmeij07