This is an old revision of the document!
This is a new DMTCP(https://github.com/dmtcp/dmtcp.git) plugin to checkpoint- restart CUDA application with noval split-process architecture.
CRAC consists of the plugin on top of DMTCP.
This software runs in the original directory
Compilation needs gcc
version 8 or later (using 9.2.0 on CentOS 7, compute node n79
)
cd /share/apps/CENTOS7/dmtcp/3.0.0.b/ # env on node n79 export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH # make in place