我要评分
获取效率
正确性
完整性
易理解

Serial Case

Run the most basic serial test case. Because a serial test case takes a relatively long time, choose a matrix whose determinant is 2048 for task analysis.

The following code implements matrix multiplication in serial mode.

Figure 1 Serial case code
  1. Run the base_matmult case whose matrix determinant is 2048.
    1
    ./matmul 2048 0 
    

    Command output:

    1
    2
    3
    Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0 
    Initialization time = 0.174492s 
    Matrix multiplication time = 62.657254s
    

    When the matrix determinant is 2048, the serial computing takes approximately 62s.

  2. Create a roofline task for the base_matmult case whose matrix determinant is 2048.

    Analyze the roofline task using the command line tool.

    1
    devkit tuner roofline -o base_matmult_2048 -m region ./matmul 2048 0
    

    The examples use the Kunpeng DevKit in CLI mode.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    Note:
        1. Roofline task is currently only supported on the 920 platform.
        2. The application must be a binary file in ELF format, and read permissions are required to detect the format of the application.
        3. Roofline task collection needs to ensure the application has finished running.
        4. The estimated time of roofline collection is about 3 * application estimated time.
        5. Roofline analysis is available only on physical machines.
        6. You can learn about the roofline profiling method by looking at document /usr/local/devkit/tuner/docs/ROOFLINE_KNOW_HOW.MD
    RFCOLLECT: Start collection for ./matmul
    RFCOLLECT: Launch application to collect performance metrics of ./matmul
    Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0
    Initialization time = 0.174628s
    ROOFLINE_EVENTS are initialized.
    Matrix multiplication time = 62.718666s
    RFCOLLECT: Launch application to do binary instrumentation of ./matmul
    Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0
    Initialization time = 0.528283s
    Matrix multiplication time = 85.328236s
    RFCOLLECT: Launch benchmarks for measuring roofs
    RFCOLLECT: Processing all collected data
    RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240506-151117.json
    RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240506-151117.json" to get report.
    
    Get roofline report ...
    The roofline json report: /matrix_multiplication/base_matmult_2048.json
    The roofline html report: /matrix_multiplication/base_matmult_2048.html
    
  3. View the base_matmult_2048.html report.
    Figure 2 base_matmult_2048.html

    In this case, Parallel Threads of roofs is 1 (serial), Elapsed Time is 62.699 seconds, GFLOP Count is 17.18, and Performance is 0.274 GFLOPS.

    According to the roofline analysis, there are 128 physical cores but only one parallel thread. Therefore, you can increase the number of parallel threads to tune the program performance.