Precision Tuning Checklist
Before tuning, check the items in Table 1 very carefully to ensure that no item is missing.
Check Item |
x86 Platform Standard |
Kunpeng Platform Standard |
Check Result |
|---|---|---|---|
Code |
It is the same as the MD5 value of the Kunpeng code. |
It is the same as the MD5 value of the x86 code. |
- |
Test case and configuration |
They are the same as the Kunpeng computing test case and the MD5 value of the configuration. |
They are the same as the x86 computing test case and the MD5 value of the configuration. |
- |
Compiler |
ICC 2018 or 2021 is recommended. |
BiSheng 3.1 or later version is recommended. |
- |
O0 and O1 consistency |
Perform the following operations to avoid ICC O3 compilation problem:
|
- |
|
Single-thread consistency |
Ensure that the number of processes is the same. Ensure that in single-thread mode, the result of Kunpeng is the same as that of x86 to avoid multi-thread precision problems. |
- |
|
Compilation options |
O3 -fp-model=precise -no-ftz -init=zero -init=arrays Prohibited: Ofast -ftz |
O3 -faarch64-pow-alt-precision=21 -enable--alt-precision-math-functions km_l9 -Hx,124,0xc00000 -ffp-contract=off -finit-zero -mllvm -disable-sincos-opt -MflushZ Prohibited: Ofast -ftz |
- |
Math library |
IMF |
Latest version of KML_L9 |
- |
MPI library |
Intel MPI |
Latest version of HMPI |
- |
MPI reduction algorithm |
-genv I_MPI_ADJUST_ALLREDUCE=1 |
-x UCX_BUILTIN_ALLREDUCE_ALGORITHM=1 |
- |