Restarting MPI Applications

The following is an example of restarting an application from checkpoint number <N>.

user@head $ mpiexec.hydra -restart -ckpoint-prefix /home/user/ckptdir -ckpointlib blcr -ckpoint-num <N> -n 32 -f hosts

When restarting, you need to revise the "hosts" file to eliminate any dead or unavailable nodes. Also, providing the executable name is not necessary when restarting because it is already stored in the checkpoint images.