鲲鹏社区首页
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

利用region模式整体分析对比

图1 case 9

region模式支持多个函数或循环的插桩,可以将多个method在一个新的case中依次运行,示例中增加method 9,依次调用以下method:

  • 1(parallel_matmult)
  • 2(transpose_B_matmult)
  • 4(block_transpose_B_matmult)
  • 5(intrinsics_transpose_B_matmult)
  • 6(kml_matmult_8192)
  1. 运行矩阵行列大小为8192的multi_method_matmult示例。
    1
    ./matmul 8192 9
    

    返回信息如下:

    1
    2
    3
    4
    5
    6
    7
    8
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 2.663596s
    Matrix multiplication time(parallel_matmult) = 524.732915s
    Matrix multiplication time(transpose_B_matmult) = 12.199910s
    Matrix multiplication time(block_transpose_B_matmult) = 3.940094s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.543300s
    Matrix multiplication time(kml_matmult) = 0.320360s
    Matrix multiplication time = 543.736720s
    

    可以依次看到各个method所花的时间。

  2. 创建multi_method_matmult 8192 case的Roofline任务。
    1
    devkit tuner roofline -o multi_method_matmult_8192 -m region ./matmul 8192 9
    

    返回信息如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    Note:
        1. Roofline task is currently only supported on the 920 platform.
        2. The application must be a binary file in ELF format, and read permissions are required to detect the format of the application.
        3. Roofline task collection needs to ensure the application has finished running.
        4. The estimated time of roofline collection is about 3 * application estimated time.
        5. Roofline analysis is available only on physical machines.
        6. You can learn about the roofline profiling method by looking at document /usr/local/devkit/tuner/docs/ROOFLINE_KNOW_HOW.MD
    RFCOLLECT: Start collection for ./matmul
    RFCOLLECT: Launch application to collect performance metrics of ./matmul
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 2.712200s
    ROOFLINE_EVENTS are initialized.
    Matrix multiplication time(parallel_matmult) = 522.059154s
    Matrix multiplication time(transpose_B_matmult) = 10.515641s
    Matrix multiplication time(block_transpose_B_matmult) = 3.325110s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.258315s
    Matrix multiplication time(kml_matmult) = 0.287929s
    Matrix multiplication time = 538.468327s
    RFCOLLECT: Launch application to do binary instrumentation of ./matmul
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 8.095281s
    Matrix multiplication time(parallel_matmult) = 348.475675s
    Matrix multiplication time(transpose_B_matmult) = 17.144564s
    Matrix multiplication time(block_transpose_B_matmult) = 3.646071s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.427023s
    Matrix multiplication time(kml_matmult) = 0.297098s
    Matrix multiplication time = 371.991296s
    RFCOLLECT: Launch benchmarks for measuring roofs
    RFCOLLECT: Processing all collected data
    RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240507-115538.json
    RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240507-115538.json" to get report.
    
    Get roofline report ...
    The roofline json report: /matrix_multiplication/multi_method_matmult_8192.json
    The roofline html report: /matrix_multiplication/multi_method_matmult_8192.html
    
  3. 查看multi_method_matmult_8192报告。

    可以将并行示例KML示例的示例汇聚到一张图上,便捷地进行相关分析。

    图2 multi_method_matmult_8192报告