鲲鹏社区首页
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

利用region模式整体分析对比

图1 case 9

region模式支持多个函数或循环的插桩,可以将多个method在一个新的case中依次运行,示例中增加method 9,依次调用以下method:

  • 1(parallel_matmult)
  • 2(transpose_B_matmult)
  • 4(block_transpose_B_matmult)
  • 5(intrinsics_transpose_B_matmult)
  • 6(kml_matmult_8192)
  1. 运行矩阵行列大小为8192的multi_method_matmult示例。
    1
    ./matmul 8192 9
    

    返回信息如下:

    1
    2
    3
    4
    5
    6
    7
    8
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 2.663596s
    Matrix multiplication time(parallel_matmult) = 524.732915s
    Matrix multiplication time(transpose_B_matmult) = 12.199910s
    Matrix multiplication time(block_transpose_B_matmult) = 3.940094s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.543300s
    Matrix multiplication time(kml_matmult) = 0.320360s
    Matrix multiplication time = 543.736720s
    

    可以依次看到各个method所花的时间。

  2. 创建multi_method_matmult 8192 case的Roofline任务。
    1
    devkit tuner roofline -o multi_method_matmult_8192 -m region ./matmul 8192 9
    

    返回信息如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    Note:
      1. Roofline task is currently only supported on the 920 platform.
      2. The application must be a binary file in ELF format.
      3. Roofline task collection needs to ensure the application has finished running.
      4. The estimated time of roofline collection is about 3 * application estimated time.
      5. You can learn about the roofline profiling method by looking at document /usr/local/devkit/tuner/docs/ROOFLINE_KNOW_HOW.MD
    RFCOLLECT: Start collection for ./matmul
    RFCOLLECT: Launch application to collect performance metrics of ./matmul
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 2.712200s
    ROOFLINE_EVENTS are initialized.
    Matrix multiplication time(parallel_matmult) = 522.059154s
    Matrix multiplication time(transpose_B_matmult) = 10.515641s
    Matrix multiplication time(block_transpose_B_matmult) = 3.325110s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.258315s
    Matrix multiplication time(kml_matmult) = 0.287929s
    Matrix multiplication time = 538.468327s
    RFCOLLECT: Launch application to do binary instrumentation of ./matmul
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 8.095281s
    Matrix multiplication time(parallel_matmult) = 348.475675s
    Matrix multiplication time(transpose_B_matmult) = 17.144564s
    Matrix multiplication time(block_transpose_B_matmult) = 3.646071s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.427023s
    Matrix multiplication time(kml_matmult) = 0.297098s
    Matrix multiplication time = 371.991296s
    RFCOLLECT: Launch benchmarks for measuring roofs
    RFCOLLECT: Processing all collected data
    RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240507-115538.json
    RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240507-115538.json" to get report.
    
    Get roofline report ...
    The roofline json report: /matrix_multiplication/multi_method_matmult_8192.json
    The roofline html report: /matrix_multiplication/multi_method_matmult_8192.html
    
  3. 查看multi_method_matmult_8192报告。
    可以将并行示例KML示例的示例汇聚到一张图上,便捷地进行相关分析。
    图2 multi_method_matmult_8192报告