Intel® MPI Library Reference Manual for Linux* OS
This topic provides information on how to use a multi-threaded version of memcpy implemented in the Intel® MPI Library for Intel® Xeon Phi™ Coprocessors. You can use this experimental feature to reach higher memory bandwidth between the ranks communicated through shared memory for some applications.
Controls usage of the multi-threaded memcpy.
I_MPI_MT_MEMCPY=<value>
<value> |
Controls the usage of the multi-threaded memcpy |
enable | yes | on | 1 |
Enable the multi-threaded memcpy in the single threaded version of the Intel® MPI Library (MPI_THREAD_SINGLE). This configuration is ignored for the thread safe version of Intel® MPI Library |
disable | no | off | 0 |
Disable the usage of the multi-threaded memcpy. This is the default value |
Set this environment variable to control whether to use multi-threaded version of memcpy for intra-node communication.
Change the number of threads involved in performing multi-threaded memcpy.
I_MPI_MT_MEMCPY_NUM_THREADS=<num>
<num> |
The number of threads involved in performing multi-threaded memcpy |
>0 |
The default value is the lesser of 8 and the number of physical cores within the MPI process pinning domain |
Use this environment variable to set the number of threads which perform memcpy operations per each MPI rank. The value 1 is equivalent to the setting I_MPI_MT_MEMCPY=disable.
Change the threshold for using multi-threaded memcpy.
I_MPI_MT_MEMCPY_THRESHOLD=<nbytes>
<nbytes> |
Define the multi-threaded memcpy threshold in bytes |
>0 |
The default value is 32768 |
Set this environment variable to control the threshold for using multi-threaded memcpy. If the threshold is larger than the shared memory buffer size (for example, see I_MPI_SHM_LMT_BUFFER_SIZE or I_MPI_SSHM_BUFFER_SIZE), multi-threaded memcpy will never be used. The usage of multi-threaded memcpy is selected according to the following scheme:
Buffers shorter than or equal to <nbytes> are sent using the serial version of memcpy. This approach is faster for short and medium buffers.
Buffers larger than <nbytes> are sent using the multi-threaded memcpy. This approach is faster for large buffers.
Control the spin count value.
I_MPI_MT_MEMCPY_SPIN_COUNT=<scount>
<scount> |
Define the loop spin count when a thread waits for data to copy before sleeping |
>0 |
The default value is equal to 100000. The maximum value is equal to 2147483647 |
Set the spin count limit for the loop for waiting for data to be copied by the thread. When the limit is exceeded and there is no data to copy, the thread goes to sleep.
Use the I_MPI_MT_MEMCPY_SPIN_COUNT environment variable for tuning application performance. The best value for <scount> can be chosen on an experimental basis. It depends on the particular computational environment and application.