Viewing Checkpoint Activity in Log File

The following is an example of launching an MPI job and specifying a checkpoint log file so that you can watch the checkpoint activity.

user@head $ mpiexec.hydra -ckpoint on -ckpoint-logfile /home/user/ckpt.log -ckpoint-tmp-prefix /ssd/user/ckptdir -ckpoint-prefix /home/user/ckptdir -ckpointlib blcr -n 32 -f hosts /home/user/myapp

The following output is a sample log:

[Mon Dec 19 13:31:36 2011] cst-linux Checkpoint log initialized (master mpiexec pid 10687, 48 processes, 6 nodes)
[Mon Dec 19 13:31:36 2011] cst-linux Permanent checkpoint storage: /mnt/lustre/user
[Mon Dec 19 13:31:36 2011] cst-linux Temporary checkpoint storage: /tmp
[Mon Dec 19 13:32:06 2011] cst-linux Started checkpoint number 0 ...
[Mon Dec 19 13:33:00 2011] cst-linux Finished checkpoint number 0.
[Mon Dec 19 13:33:00 2011] cst-linux Moving checkpoint 0 from /tmp to /mnt/lustre/user ...
[Mon Dec 19 13:38:00 2011] cst-linux Moved checkpoint 0 from /tmp to /mnt/lustre/user