Demo Overview
- Obtain the demo from GitHub.
- The demo is used to multiply two matrices.
Figure 1 Matrix multiplication example
- How to compile
Use Make to compile two binary files: matmul_nokml and matmul.
To build a Kunpeng Math Library (KML) version, install the KML and set the KML_INCLUDE and KML_LIB environment variables.
- How to use the demo
./matmul size method [test_correctness]
- size (required): determinant of the matrix. The value must be a power of 2, for example, 512, 1024, 2048, and 4096.
- method (required): ranges from 0 to 5 for matmul_nokml and 0 to 6 for matmul.

- test_correctness (optional): If it is set to true, the matrix computation accuracy is checked. The default value is false.
Table 1 Method description Method
Description
0 (base_matmult)
Basic serial computing.
1 (parallel_matmult)
Common OpenMP parallel computing based on 0 (base_matmult).
2 (transpose_B_matmult)
Matrix transpose optimization in addition to 1 (parallel_matmult).
3 (change_loop_order_matmult)
Similar to 2 (transpose_B_matmult), which changes the loop method.
4 (block_transpose_B_matmult)
Internal block loop optimization of the matrix in addition to 2 (transpose_B_matmult).
5 (intrinsics_transpose_B_matmult)
Arm Neon vectorized computation optimization in addition to 4 (block_transpose_B_matmult).
6 (kml_matmult)
Optimization with the KML.
Method 2 and method 3 are variants of the same optimization method, and are used to optimize the memory access sequence of reading matrix B.
- By default, the demo uses the Float data type for computation. The -DDOUBLE_TYPE option is added for the Double data type.
- You can modify OMP_NUM_THREADS to set the degree of parallelism. If you do not set this parameter, the parameter value is equal to the number of physical cores. If you use method 4 (block_transpose_B_matmult), explicitly specify the degree of parallelism.