IntelĀ® MPI Library Reference Manual for Linux* OS
The following is an example of launching an MPI job and specifying a checkpoint log file so that you can watch the checkpoint activity.
user@head $ mpiexec.hydra -ckpoint on -ckpoint-logfile /home/user/ckpt.log -ckpoint-tmp-prefix /ssd/user/ckptdir -ckpoint-prefix /home/user/ckptdir -ckpointlib blcr -n 32 -f hosts /home/user/myapp
The following output is a sample log:
[Mon Dec 19 13:31:36 2011] cst-linux Checkpoint log initialized (master mpiexec pid 10687, 48 processes, 6 nodes) [Mon Dec 19 13:31:36 2011] cst-linux Permanent checkpoint storage: /mnt/lustre/user [Mon Dec 19 13:31:36 2011] cst-linux Temporary checkpoint storage: /tmp [Mon Dec 19 13:32:06 2011] cst-linux Started checkpoint number 0 ... [Mon Dec 19 13:33:00 2011] cst-linux Finished checkpoint number 0. [Mon Dec 19 13:33:00 2011] cst-linux Moving checkpoint 0 from /tmp to /mnt/lustre/user ... [Mon Dec 19 13:38:00 2011] cst-linux Moved checkpoint 0 from /tmp to /mnt/lustre/user