User Tools

Site Tools


cluster:211

This is an old revision of the document!



Back

DMTCP CRAC

This is a new DMTCP(https://github.com/dmtcp/dmtcp.git) plugin to checkpoint- restart CUDA application with noval split-process architecture.

CRAC consists of the plugin on top of DMTCP.
This software runs in the original directory

Compilation needs gcc version 8 or later (using 9.2.0 on CentOS 7, compute node n79)

# env on node n79 CRAC-early-developmennt-master.zip

 export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
 export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH

 export CUDA_HOME=/usr/local/cuda
 export PATH=/usr/local/cuda/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
 export LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH


# make in place
cd /share/apps/CENTOS7/dmtcp/3.0.0.b/
./configure
make  # no errors
$ ls bin
dmtcp_command      dmtcp_discover_rm  dmtcp_nocheckpoint  dmtcp_rm_loclaunch  dmtcp_ssh   mtcp_restart
dmtcp_coordinator  dmtcp_launch       dmtcp_restart       dmtcp_srun_helper   dmtcp_sshd

make check  # dmtcp1-5 all failed, msg: checkpoint error ???
make check2
make check3


Back

cluster/211.1646063453.txt.gz · Last modified: 2022/02/28 15:50 by hmeij07