Architectures

The difference of calculation results is often caused by many factors. When the code implementations of two platforms are different, the precisions are prone to differ. When the code implementations of two platforms are the same, the precision difference is mainly caused by the differences in the architecture, compiler, math library, and MPI. Different platforms use different policies to sacrifice precision for faster speed, which may cause inaccurate results. Inaccurate results will take place even on the same platform, if different compilation options are modified. For HPC applications, different number of processes may cause precision differences. Different number of processes lead to different segmentation policy of the corresponding application grid. As a result, the precision is different when generated results are summarized.

The architecture of the Kunpeng processor is different from that of the x86 processor. The x86 processor has 40-bit and 80-bit floating-point compute units, but the Kunpeng processor does not have these compute units. If these compute units are used on the x86 platform, the calculation results of the two platforms are inconsistent. Currently, software simulation can be performed on the Kunpeng devices to solve this problem. For example, the Intel C++ Compiler (ICC) math library uses the powr8i4 function, which calls 80-bit x87 instructions. That is, a floating-point number occupies 80 bits. On the Kunpeng devices, you can use the open-source multiple-precision floating-point reliable (MPFR) library that supports calculation of multiple precisions for the pow calculation of 80-bit floating-point numbers.

x86 processors have 40-bit and 80-bit floating-point compute units, which are not available on Kunpeng processors.
FTZ and DAZ are standards for processing denormalized numbers, and have been fixed to hardware implementation. There are differences between Kunpeng and Intel in terms of this standard. The following calculation is used as an example:
1 2 3
float a= 1.09628081709e-33; float b= 1.07225660031e-05; float c = a * b;
Kunpeng result: 0. (Multiplication is performed to obtain a denormalized floating-point number, then FTZ is executed to convert the number to 0, and rounding is performed last. The result is still 0.)

x86 result: 1.17549435082e-38. (Multiplication is performed to obtain a denormalized floating-point number, then the number is rounded off to 0x800000, and last FTZ is executed. The result is 0x800000.)

Parent topic: Causes of Calculation Result Differences