Usage Model

An application sets MPI_ERRORS_RETURN error handler and checks the return code after each communication call. If a communication call does not return MPI_SUCCESS, the destination process should be marked unreachable and exclude communication with it. For example:

if(live_ranks[rank]) {
    mpi_err = MPI_Send(buf, count, dtype, rank, tag, MPI_COMM_WORLD);
    if(mpi_err != MPI_SUCCESS) {
        live_ranks[rank] = 0;
    }
}

In the case of non-blocking communications, errors can appear during wait/test operations.