Rate This Document
Findability
Accuracy
Completeness
Readability

Precision Tuning Checklist

Before tuning, check the items in Table 1 very carefully to ensure that no item is missing.

Table 1 Precision tuning checklist

Check Item

x86 Platform Standard

Kunpeng Platform Standard

Check Result

Code

It is the same as the MD5 value of the Kunpeng code.

It is the same as the MD5 value of the x86 code.

-

Test case and configuration

They are the same as the Kunpeng computing test case and the MD5 value of the configuration.

They are the same as the x86 computing test case and the MD5 value of the configuration.

-

Compiler

ICC 2018 or 2021 is recommended.

BiSheng 3.1 or later version is recommended.

-

O0 and O1 consistency

Perform the following operations to avoid ICC O3 compilation problem:

  • Scan the compilation configuration and change all O3 or Ofast to O0, run the application, and check whether the results on the x86 and Kunpeng platforms are consistent. If they are inconsistent, the ICC O3 compilation problem occurs. If they are consistent, this check item is normal and you can proceed to the next check item.
  • Scan the compilation configuration and change all O3 or Ofast to O1, run the application, and check whether the results on the x86 and Kunpeng platforms are consistent. If they are inconsistent, the ICC O3 compilation problem occurs. If they are consistent, this check item is normal and you can proceed to the next check item.

-

Single-thread consistency

Ensure that the number of processes is the same. Ensure that in single-thread mode, the result of Kunpeng is the same as that of x86 to avoid multi-thread precision problems.

-

Compilation options

O3 -fp-model=precise

-no-ftz -init=zero -init=arrays

Prohibited: Ofast -ftz

O3 -faarch64-pow-alt-precision=21 -enable--alt-precision-math-functions km_l9 -Hx,124,0xc00000 -ffp-contract=off -finit-zero -mllvm -disable-sincos-opt -MflushZ

Prohibited: Ofast -ftz

-

Math library

IMF

Latest version of KML_L9

-

MPI library

Intel MPI

Latest version of HMPI

-

MPI reduction algorithm

-genv I_MPI_ADJUST_ALLREDUCE=1

-x UCX_BUILTIN_ALLREDUCE_ALGORITHM=1

-