Communication Fabrics Control

I_MPI_FABRICS

(I_MPI_DEVICE)

Select the particular network fabrics to be used.

Syntax

I_MPI_FABRICS=<fabric>|<intra-node fabric>:<inter-nodes fabric>

Where <fabric> := {shm, dapl, tcp, tmi, ofa, ofi}

<intra-node fabric> := {shm, dapl, tcp, tmi, ofa, ofi}

<inter-nodes fabric> := {dapl, tcp, tmi, ofa, ofi}

Deprecated Syntax

I_MPI_DEVICE=<device>[:<provider>]

Arguments

<fabric>

Define a network fabric

shm

Shared-memory

dapl

DAPL-capable network fabrics, such as InfiniBand*, iWarp*, Dolphin*, and XPMEM* (through DAPL*)

tcp

TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*)

tmi

TMI-capable network fabrics including Intel® True Scale Fabric, Myrinet*, (through Tag Matching Interface)

ofa

OFA-capable network fabric including InfiniBand* (through OFED* verbs)

ofi

OFI (OpenFabrics Interfaces*)-capable network fabric including Intel® True Scale Fabric, and TCP (through OFI* API)

Correspondence with I_MPI_DEVICE

<device>

<fabric>

sock

tcp

shm

shm

ssm

shm:tcp

rdma

dapl

rdssm

shm:dapl

<provider>

Optional DAPL* provider name (only for the rdma and the rdssm devices)

I_MPI_DAPL_PROVIDER=<provider> or I_MPI_DAPL_UD_PROVIDER=<provider>

Use the <provider> specification only for the {rdma,rdssm} devices.

For example, to select the OFED* InfiniBand* device, use the following command:

$ mpiexec -n <# of processes> \

-env I_MPI_DEVICE rdssm:OpenIB-cma <executable>

For these devices, if <provider> is not specified, the first DAPL* provider in the /etc/dat.conf file is used.

Description

Set this environment variable to select a specific fabric combination. If the requested fabric(s) is not available, Intel® MPI Library can fall back to other fabric(s). See I_MPI_FALLBACK for details. If the I_MPI_FABRICS environment variable is not defined, Intel® MPI Library selects the most appropriate fabric combination automatically.

The exact combination of fabrics depends on the number of processes started per node.

The shm fabric is available for both Intel® and non-Intel microprocessors, but it may perform additional optimizations for Intel microprocessors than it performs for non-Intel microprocessors.

Note

The combination of selected fabrics ensures that the job runs, but this combination may not provide the highest possible performance for the given cluster configuration.

For example, to select shared-memory as the chosen fabric, use the following command:

$ mpirun -n <# of processes> -env I_MPI_FABRICS shm <executable>

To select shared-memory and DAPL-capable network fabric as the chosen fabric combination, use the following command:

$ mpirun -n <# of processes> -env I_MPI_FABRICS shm:dapl <executable>

To enable Intel® MPI Library to select most appropriate fabric combination automatically, use the following command:

$ mpirun -n <# of procs> -perhost <# of procs per host> <executable>

Set the level of debug information to 2 or higher to check which fabrics have been initialized. See I_MPI_DEBUG for details. For example:

[0] MPI startup(): shm and dapl data transfer modes

or

[0] MPI startup(): tcp data transfer mode

I_MPI_FABRICS_LIST

Define a fabrics list.

Syntax

I_MPI_FABRICS_LIST=<fabrics list>

Where <fabrics list> := <fabric>,...,<fabric>

<fabric> := {dapl, tcp, tmi, ofa, ofi}

Arguments

<fabrics list>

Specify a list of fabrics

dapl,ofa,tcp,tmi,ofi

This is the default value

dapl,tcp,ofa,tmi,ofi

If you specify I_MPI_WAIT_MODE=enable, this is the default value

tmi,dapl,tcp,ofa,ofi

This is the default value for nodes that have Intel® True Scale Fabric available and do not have any other type of interconnect cards. In case host has several types of HCAs, this does not apply

Description

Set this environment variable to define a list of fabrics. The library uses the fabrics list to choose the most appropriate fabrics combination automatically. For more information on fabric combination, see I_MPI_FABRICS.

For example, if I_MPI_FABRICS_LIST=dapl, tcp, and I_MPI_FABRICS is not defined, and the initialization of DAPL-capable network fabrics fails, the library falls back to TCP-capable network fabric. For more information on fallback, see I_MPI_FALLBACK.

I_MPI_FALLBACK

(I_MPI_FALLBACK_DEVICE)

Set this environment variable to enable fallback to the first available fabric.

Syntax

I_MPI_FALLBACK=<arg>

Deprecated Syntax

I_MPI_FALLBACK_DEVICE=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Fall back to the first available fabric. This is the default value if you do not set the I_MPI_FABRICS(I_MPI_DEVICE) environment variable.

disable | no| off |0

Terminate the job if MPI cannot initialize the one of the fabrics selected by the I_MPI_FABRICS environment variable. This is the default value if you do not set the I_MPI_FABRICS(I_MPI_DEVICE) environment variable.

Description

Set this environment variable to control fallback to the first available fabric.

If you set I_MPI_FALLBACK to enable and an attempt to initialize a specified fabric fails, the library uses the first available fabric from the list of fabrics. See I_MPI_FABRICS_LIST for details.

If you set I_MPI_FALLBACK to disable and an attempt to initialize a specified fabric fails, the library terminates the MPI job.

Note

If you set I_MPI_FABRICS and I_MPI_FALLBACK=enable, the library falls back to fabrics with higher numbers in the fabrics list. For example, if I_MPI_FABRICS=dapl, I_MPI_FABRICS_LIST=ofa,tmi,dapl,tcp, I_MPI_FALLBACK=enable and the initialization of DAPL-capable network fabrics fails, the library falls back to TCP-capable network fabric.

I_MPI_LARGE_SCALE_THRESHOLD

Change the threshold for enabling scalable optimizations.

Syntax

I_MPI_LARGE_SCALE_THRESHOLD=<arg>

Arguments

<nprocs>

Define the scale threshold

> 0

The default value is 4096

Description

This variable defines the number of processes when the DAPL UD IB extension is turned on automatically.

I_MPI_EAGER_THRESHOLD

Change the eager/rendezvous message size threshold for all devices.

Syntax

I_MPI_EAGER_THRESHOLD=<nbytes>

Arguments

<nbytes>

Set the eager/rendezvous message size threshold

> 0

The default <nbytes> value is equal to 262144 bytes

Description

Set this environment variable to control the protocol used for point-to-point communication:

I_MPI_INTRANODE_EAGER_THRESHOLD

Change the eager/rendezvous message size threshold for intra-node communication mode.

Syntax

I_MPI_INTRANODE_EAGER_THRESHOLD=<nbytes>

Arguments

<nbytes>

Set the eager/rendezvous message size threshold for intra-node communication

> 0

The default <nbytes> value is equal to 262144 bytes for all fabrics except shm. For shm, cutover point is equal to the value of I_MPI_SHM_CELL_SIZE environment variable

Description

Set this environment variable to change the protocol used for communication within the node:

If you do not set I_MPI_INTRANODE_EAGER_THRESHOLD, the value of I_MPI_EAGER_THRESHOLD is used.

I_MPI_SPIN_COUNT

Control the spin count value.

Syntax

I_MPI_SPIN_COUNT=<scount>

Arguments

<scount>

Define the loop spin count when polling fabric(s)

> 0

The default <scount> value is equal to 1 when more than one process runs per processor/core. Otherwise the value equals 250.The maximum value is equal to 2147483647

Description

Set the spin count limit. The loop for polling the fabric(s) spins <scount> times before the library releases the processes if no incoming messages are received for processing. Within every spin loop, the shm fabric (if enabled) is polled an extra I_MPI_SHM_SPIN_COUNT times. Smaller values for <scount> cause the Intel® MPI Library to release the processor more frequently.

Use the I_MPI_SPIN_COUNT environment variable for tuning application performance. The best value for <scount> can be chosen on an experimental basis. It depends on the particular computational environment and the application.

I_MPI_SCALABLE_OPTIMIZATION

Turn on/off scalable optimization of the network fabric communication.

Syntax

I_MPI_SCALABLE_OPTIMIZATION=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on scalable optimization of the network fabric communication. This is the default for 16 or more processes

disable | no | off | 0

Turn off scalable optimization of the network fabric communication. This is the default value for less than 16 processes

Description

Set this environment variable to enable scalable optimization of the network fabric communication. In most cases, using optimization decreases latency and increases bandwidth for a large number of processes.

I_MPI_WAIT_MODE

Turn on/off wait mode.

Syntax

I_MPI_WAIT_MODE=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on the wait mode

disable | no | off | 0

Turn off the wait mode. This is the default

Description

Set this environment variable to control the wait mode. If you enable this mode, the processes wait for receiving messages without polling the fabric(s). This mode can save CPU time for other tasks.

Use the Native POSIX Thread Library* with the wait mode for shm communications.

Note

To check which version of the thread library is installed, use the following command:

$ getconf GNU_LIBPTHREAD_VERSION

I_MPI_DYNAMIC_CONNECTION

(I_MPI_USE_DYNAMIC_CONNECTIONS)

Control the dynamic connection establishment.

Syntax

I_MPI_DYNAMIC_CONNECTION=<arg>

Deprecated Syntax

I_MPI_USE_DYNAMIC_CONNECTIONS=<arg>

Arguments

<arg>

Binary indicator

enable | yes | on | 1

Turn on the dynamic connection establishment. This is the default for 64 or more processes

disable | no | off | 0

Turn off the dynamic connection establishment. This is the default for less than 64 processes

Description

Set this environment variable to control dynamic connection establishment.

The default value depends on the number of processes in the MPI job. The dynamic connection establishment is off if the total number of processes is less than 64.