DAPL-capable Network Fabrics Control

I_MPI_DAPL_PROVIDER

Define the DAPL provider to load.

Syntax

I_MPI_DAPL_PROVIDER=<name>

Arguments

<name>

Define the name of DAPL provider to load

Description

Set this environment variable to define the name of DAPL provider to load. This name is also defined in the dat.conf configuration file.

I_MPI_DAT_LIBRARY

Select the DAT library to be used for DAPL* provider.

Syntax

I_MPI_DAT_LIBRARY=<library>

Arguments

<library>

Specify the DAT library for DAPL provider to be used. Default values are libdat.so or libdat.so.1 for DAPL* 1.2 providers and libdat2.so or libdat2.so.2 for DAPL* 2.0 providers

Description

Set this environment variable to select a specific DAT library to be used for DAPL provider. If the library is not located in the dynamic loader search path, specify the full path to the DAT library. This environment variable affects only on DAPL and DAPL UD capable fabrics.

I_MPI_DAPL_TRANSLATION_CACHE

(I_MPI_RDMA_TRANSLATION_CACHE)

Turn on/off the memory registration cache in the DAPL path.

Syntax

I_MPI_DAPL_TRANSLATION_CACHE=<arg>

Deprecated Syntax

I_MPI_RDMA_TRANSLATION_CACHE=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on the memory registration cache. This is the default

disable | no | off | 0

Turn off the memory registration cache

Description

Set this environment variable to turn on/off the memory registration cache in the DAPL path.

The cache substantially increases performance, but may lead to correctness issues in certain situations. See product Release Notes for further details.

I_MPI_DAPL_TRANSLATION_CACHE_AVL_TREE

Enable/disable the AVL tree* based implementation of the RDMA translation cache in the DAPL path.

Syntax

I_MPI_DAPL_TRANSLATION_CACHE_AVL_TREE=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on the AVL tree based RDMA translation cache

disable | no | off | 0

Turn off the AVL tree based RDMA translation cache. This is the default value

Description

Set this environment variable to enable the AVL tree based implementation of RDMA translation cache in the DAPL path. When the search in RDMA translation cache handles over 10,000 elements, the AVL tree based RDMA translation cache is faster than the default implementation.

I_MPI_DAPL_DIRECT_COPY_THRESHOLD

(I_MPI_RDMA_EAGER_THRESHOLD, RDMA_IBA_EAGER_THRESHOLD)

Change the threshold of the DAPL direct-copy protocol.

Syntax

I_MPI_DAPL_DIRECT_COPY_THRESHOLD=<nbytes>

Deprecated Syntaxes

I_MPI_RDMA_EAGER_THRESHOLD=<nbytes>

RDMA_IBA_EAGER_THRESHOLD=<nbytes>

Arguments

<nbytes>

Define the DAPL direct-copy protocol threshold

> 0

The default <nbytes> value depends on the platform

Description

Set this environment variable to control the DAPL direct-copy protocol threshold. Data transfer algorithms for the DAPL-capable network fabrics are selected based on the following scheme:

This environment variable is available for both Intel® and non-Intel microprocessors, but it may perform additional optimizations for Intel microprocessors than it performs for non-Intel microprocessors.

Note

The equivalent of this variable for Intel® Xeon Phi™ Coprocessor is I_MIC_MPI_DAPL_DIRECT_COPY_THRESHOLD

I_MPI_DAPL_EAGER_MESSAGE_AGGREGATION

Control the use of concatenation for adjourned MPI send requests. Adjourned MPI send requests are those that cannot be sent immediately.

Syntax

I_MPI_DAPL_EAGER_MESSAGE_AGGREGATION=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Enable the concatenation for adjourned MPI send requests

disable | no | off | 0

Disable the concatenation for adjourned MPI send requests. This is the default value

Set this environment variable to control the use of concatenation for adjourned MPI send requests intended for the same MPI rank. In some cases, this mode can improve the performance of applications, especially when MPI_Isend() is used with short message sizes and the same destination rank, such as:

for( i = 0; i< NMSG; i++)

{ret = MPI_Isend( sbuf[i], MSG_SIZE, datatype, dest , tag, \  

comm, &req_send[i]);

}

I_MPI_DAPL_DYNAMIC_CONNECTION_MODE

(I_MPI_DYNAMIC_CONNECTION_MODE, I_MPI_DYNAMIC_CONNECTIONS_MODE)

Choose the algorithm for establishing the DAPL* connections.

Syntax

I_MPI_DAPL_DYNAMIC_CONNECTION_MODE=<arg>

Deprecated Syntax

I_MPI_DYNAMIC_CONNECTION_MODE=<arg>

I_MPI_DYNAMIC_CONNECTIONS_MODE=<arg>

Arguments

<arg>

Mode selector

reject

Deny one of the two simultaneous connection requests. This is the default

disconnect

Deny one of the two simultaneous connection requests after both connections have been established

Description

Set this environment variable to choose the algorithm for handling dynamically established connections for DAPL-capable fabrics according to the following scheme:

I_MPI_DAPL_SCALABLE_PROGRESS

(I_MPI_RDMA_SCALABLE_PROGRESS)

Turn on/off scalable algorithm for DAPL read progress.

Syntax

I_MPI_DAPL_SCALABLE_PROGRESS=<arg>

Deprecated Syntax

I_MPI_RDMA_SCALABLE_PROGRESS=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on scalable algorithm. When the number of processes is larger than 128, this is the default value

disable | no | off | 0

Turn off scalable algorithm. When the number of processes is less than or equal to 128, this is the default value

Description

Set this environment variable to enable scalable algorithm for the DAPL read progress. In some cases, this provides advantages for systems with many processes.

I_MPI_DAPL_BUFFER_NUM

(I_MPI_RDMA_BUFFER_NUM, NUM_RDMA_BUFFER)

Change the number of internal pre-registered buffers for each process pair in the DAPL path.

Syntax

I_MPI_DAPL_BUFFER_NUM=<nbuf>

Deprecated Syntaxes

I_MPI_RDMA_BUFFER_NUM=<nbuf>

NUM_RDMA_BUFFER=<nbuf>

Arguments

<nbuf>

Define the number of buffers for each pair in a process group

> 0

The default value depends on the platform

Description

Set this environment variable to change the number of the internal pre-registered buffers for each process pair in the DAPL path.

Note

The more pre-registered buffers are available, the more memory is used for every established connection.

I_MPI_DAPL_BUFFER_SIZE

(I_MPI_RDMA_BUFFER_SIZE, I_MPI_RDMA_VBUF_TOTAL_SIZE)

Change the size of internal pre-registered buffers for each process pair in the DAPL path.

Syntax

I_MPI_DAPL_BUFFER_SIZE=<nbytes>

Deprecated Syntaxes

I_MPI_RDMA_BUFFER_SIZE=<nbytes>

I_MPI_RDMA_VBUF_TOTAL_SIZE=<nbytes>

Arguments

<nbytes>

Define the size of pre-registered buffers

> 0

The default value depends on the platform

Description

Set this environment variable to define the size of the internal pre-registered buffer for each process pair in the DAPL path. The actual size is calculated by adjusting the <nbytes> to align the buffer to an optimal value.

I_MPI_DAPL_RNDV_BUFFER_ALIGNMENT

(I_MPI_RDMA_RNDV_BUFFER_ALIGNMENT, I_MPI_RDMA_RNDV_BUF_ALIGN)

Define the alignment of the sending buffer for the DAPL direct-copy transfers.

Syntax

I_MPI_DAPL_RNDV_BUFFER_ALIGNMENT=<arg>

Deprecated Syntaxes

I_MPI_RDMA_RNDV_BUFFER_ALIGNMENT=<arg>

I_MPI_RDMA_RNDV_BUF_ALIGN=<arg>

Arguments

<arg>

Define the alignment for the sending buffer

> 0 and a power of 2

The default value is 64

Set this environment variable to define the alignment of the sending buffer for DAPL direct-copy transfers. When a buffer specified in a DAPL operation is aligned to an optimal value, the data transfer bandwidth may be increased.

I_MPI_DAPL_RDMA_RNDV_WRITE

(I_MPI_RDMA_RNDV_WRITE, I_MPI_USE_RENDEZVOUS_RDMA_WRITE)

Turn on/off the RDMA Write-based rendezvous direct-copy protocol in the DAPL path.

Syntax

I_MPI_DAPL_RDMA_RNDV_WRITE=<arg>

Deprecated Syntaxes

I_MPI_RDMA_RNDV_WRITE=<arg>

I_MPI_USE_RENDEZVOUS_RDMA_WRITE=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on the RDMA Write rendezvous direct-copy protocol

disable | no | off | 0

Turn off the RDMA Write rendezvous direct-copy protocol

Description

Set this environment variable to select the RDMA Write-based rendezvous direct-copy protocol in the DAPL path. Certain DAPL* providers have a slow RDMA Read implementation on certain platforms. Switching on the rendezvous direct-copy protocol based on the RDMA Write operation can increase performance in these cases. The default value depends on the DAPL provider attributes.

I_MPI_DAPL_CHECK_MAX_RDMA_SIZE

(I_MPI_RDMA_CHECK_MAX_RDMA_SIZE)

Check the value of the DAPL attribute, max_rdma_size.

Syntax

I_MPI_DAPL_CHECK_MAX_RDMA_SIZE=<arg>

Deprecated Syntax

I_MPI_RDMA_CHECK_MAX_RDMA_SIZE=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Check the value of the DAPL* attribute max_rdma_size

disable | no | off | 0

Do not check the value of the DAPL* attribute max_rdma_size. This is the default value

Description

Set this environment variable to control message fragmentation according to the following scheme:

I_MPI_DAPL_MAX_MSG_SIZE

(I_MPI_RDMA_MAX_MSG_SIZE)

Control message fragmentation threshold.

Syntax

I_MPI_DAPL_MAX_MSG_SIZE=<nbytes>

Deprecated Syntax

I_MPI_RDMA_MAX_MSG_SIZE=<nbytes>

Arguments

<nbytes>

Define the maximum message size that can be sent through DAPL without fragmentation

> 0

If the I_MPI_DAPL_CHECK_MAX_RDMA_SIZE environment variable is enabled, the default <nbytes> value is equal to the max_rdma_size DAPL attribute value. Otherwise the default value is MAX_INT

Description

Set this environment variable to control message fragmentation size according to the following scheme:

I_MPI_DAPL_CONN_EVD_SIZE

(I_MPI_RDMA_CONN_EVD_SIZE, I_MPI_CONN_EVD_QLEN)

Define the event queue size of the DAPL event dispatcher for connections.

Syntax

I_MPI_DAPL_CONN_EVD_SIZE=<size>

Deprecated Syntaxes

I_MPI_RDMA_CONN_EVD_SIZE=<size>

I_MPI_CONN_EVD_QLEN=<size>

Arguments

<size>

Define the length of the event queue

> 0

The default value is 2*number of processes + 32 in the MPI job

Description

Set this environment variable to define the event queue size of the DAPL event dispatcher that handles connection related events. If this environment variable is set, the minimum value between <size> and the value obtained from the provider is used as the size of the event queue. The provider is required to supply a queue size that equal or larger than the calculated value.

I_MPI_DAPL_SR_THRESHOLD

Change the threshold of switching send/recv to rdma path for DAPL wait mode.

Syntax

I_MPI_DAPL_SR_THRESHOLD=<arg>

Arguments

<nbytes>

Define the message size threshold of switching send/recv to rdma

>= 0

The default <nbytes> value is 256 bytes

Description

Set this environment variable to control the protocol used for point-to-point communication in DAPL wait mode:

I_MPI_DAPL_SR_BUF_NUM

Change the number of internal pre-registered buffers for each process pair used in DAPL wait mode for send/recv path.

Syntax

I_MPI_DAPL_SR_BUF_NUM=<nbuf>

Arguments

<nbuf>

Define the number of send/recv buffers for each pair in a process group

> 0

The default value is 32

Description

Set this environment variable to change the number of the internal send/recv pre-registered buffers for each process pair.

I_MPI_DAPL_RDMA_WRITE_IMM

(I_MPI_RDMA_WRITE_IMM)

Enable/disable RDMA Write with immediate data InfiniBand (IB) extension in DAPL wait mode.

Syntax

I_MPI_DAPL_RDMA_WRITE_IMM=<arg>

Deprecated syntax

I_MPI_RDMA_WRITE_IMM=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on RDMA Write with immediate data IB extension

disable | no | off | 0

Turn off RDMA Write with immediate data IB extension

Description

Set this environment variable to utilize RDMA Write with immediate data IB extension. The algorithm is enabled if this environment variable is set and a certain DAPL provider attribute indicates that RDMA Write with immediate data IB extension is supported.

I_MPI_DAPL_DESIRED_STATIC_CONNECTIONS_NUM

Define the number of processes that establish DAPL static connections at the same time.

Syntax

I_MPI_DAPL_DESIRED_STATIC_CONNECTIONS_NUM=<num_procesess>

Arguments

<num_procesess>

Define the number of processes that establish DAPL static connections at the same time

> 0

The default <num_procesess> value is equal to 256

Description

Set this environment variable to control the algorithm of DAPL static connection establishment.

If the number of processes in the MPI job is less than or equal to <num_procesess>, all MPI processes establish the static connections simultaneously. Otherwise, the processes are distributed into several groups. The number of processes in each group is calculated to be close to <num_procesess>. Then static connections are established in several iterations, including intergroup connection setup.


I_MPI_CHECK_DAPL_PROVIDER_COMPATIBILITY

Enable/disable the check that the same DAPL provider is selected by all ranks.

Syntax

I_MPI_CHECK_DAPL_PROVIDER_COMPATIBILITY=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on the check that the DAPL provider is the same on all ranks. This is default value

disable | no | off | 0

Turn off the check that the DAPL provider is the same on all ranks

Description

Set this variable to make a check if the DAPL provider is selected by all MPI ranks. If this check is enabled, Intel® MPI Library checks the name of DAPL provider and the version of DAPL. If these parameters are not the same on all ranks, Intel MPI Library does not select the RDMA path and may fall to sockets. Turning off the check reduces the execution time of MPI_Init(). It may be significant for MPI jobs with a large number of processes.