Rate This Document
Findability
Accuracy
Completeness
Readability

Demo Overview

  1. Obtain the demo from GitHub.
  2. The demo is used to multiply two matrices.
    Figure 1 Matrix multiplication example
  3. How to compile

    Use Make to compile two binary files: matmul_nokml and matmul.

    To build a Kunpeng Math Library (KML) version, install the KML and set the KML_INCLUDE and KML_LIB environment variables.

  4. How to use the demo
    ./matmul size method [test_correctness]
    • size (required): determinant of the matrix. The value must be a power of 2, for example, 512, 1024, 2048, and 4096.
    • method (required): ranges from 0 to 5 for matmul_nokml and 0 to 6 for matmul.

    • test_correctness (optional): If it is set to true, the matrix computation accuracy is checked. The default value is false.
    Table 1 Method description

    Method

    Description

    0 (base_matmult)

    Basic serial computing.

    1 (parallel_matmult)

    Common OpenMP parallel computing based on 0 (base_matmult).

    2 (transpose_B_matmult)

    Matrix transpose optimization in addition to 1 (parallel_matmult).

    3 (change_loop_order_matmult)

    Similar to 2 (transpose_B_matmult), which changes the loop method.

    4 (block_transpose_B_matmult)

    Internal block loop optimization of the matrix in addition to 2 (transpose_B_matmult).

    5 (intrinsics_transpose_B_matmult)

    Arm Neon vectorized computation optimization in addition to 4 (block_transpose_B_matmult).

    6 (kml_matmult)

    Optimization with the KML.

    Method 2 and method 3 are variants of the same optimization method, and are used to optimize the memory access sequence of reading matrix B.

  5. By default, the demo uses the Float data type for computation. The -DDOUBLE_TYPE option is added for the Double data type.
  6. You can modify OMP_NUM_THREADS to set the degree of parallelism. If you do not set this parameter, the parameter value is equal to the number of physical cores. If you use method 4 (block_transpose_B_matmult), explicitly specify the degree of parallelism.