Rate This Document
Findability
Accuracy
Completeness
Readability

MPI

Message Passing Interface (MPI) is a parallel communication interface in HPC. Allreduce is a set of operations in MPI, which is used to transfer all data on a node to other nodes so that all nodes can access the data. Allreduce is usually used in parallel computing scenarios such as iterative summation and gradient update. The Allreduce algorithm may cause precision differences due to the following reasons:

  • Limited data precision

    In the Allreduce operation, each node needs to aggregate its local data. Due to the speed limitation of communication between nodes, data to be aggregated may need to be transferred between nodes for multiple times. During each transfer, data may be rounded off due to the computer's floating-point precision limit. These errors caused by rounding accumulate along with more transfers taking place, resulting in precision differences in the results.

  • Different machine precision

    Nodes involved in Allreduce may be equipped with different hardware and OSs. As a result, those nodes may have different floating-point calculation precisions. For example, some nodes may use single-precision floating point (Float), while other nodes may use double-precision floating point (Double), which can cause precision difference of the results.

  • Different algorithm implementations

    The Allreduce algorithm has different implementations which may cause precision difference in the results. For example, different aggregation policies or data exchange methods may be used in different implementations, which may affect the precision of the results. Intel MPI and HMPI perform topology awareness on nodes. As a result, the reduction sequences of nodes are different.

  • Different parallelism settings

    In Allreduce, parallelism is set to control the data transfer speed. A higher degree of parallelism indicates a faster data transfer speed, but may increase communication overheads between nodes, thereby affecting precision. A lower degree of parallelism indicates a slower data transfer speed, but it can reduce the communication overhead and improve precision.