鲲鹏社区首页
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

串行示例

运行最基本的串行示例,由于串行示例耗时较长,选取行列大小均为2048的矩阵进行任务分析。

该部分代码通过串行方式实现矩阵相乘。

图1 串行示例代码
  1. 运行矩阵行列大小为2048的base_matmult的示例。
    1
    ./matmul 2048 0 
    

    返回信息如下:

    1
    2
    3
    Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0 
    Initialization time = 0.174492s 
    Matrix multiplication time = 62.657254s
    

    矩阵行列大小为2048情况下,串行计算耗时62秒左右。

  2. 创建矩阵行列大小为2048的base_matmult的Roofline任务。

    使用命令行工具进行roofline任务分析。

    1
    devkit tuner roofline -o base_matmult_2048 -m region ./matmul 2048 0
    

    示例中均使用鲲鹏DevKit命令行模式,也可使用Web模式的Roofline任务进行分析。

    返回信息如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    Note:
    1. Roofline task is currently only supported on the 920 platform.
    2. The application must be a binary file in ELF format.
    3. Roofline task collection needs to ensure the application has finished running.
    4. The estimated time of roofline collection is about 3 * application estimated time.
    RFCOLLECT: Start collection for ./matmul
    RFCOLLECT: Launch application to collect performance metrics of ./matmul
    Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0
    Initialization time = 0.174628s
    ROOFLINE_EVENTS are initialized.
    Matrix multiplication time = 62.718666s
    RFCOLLECT: Launch application to do binary instrumentation of ./matmul
    Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0
    Initialization time = 0.528283s
    Matrix multiplication time = 85.328236s
    RFCOLLECT: Launch benchmarks for measuring roofs
    RFCOLLECT: Processing all collected data
    RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240506-151117.json
    RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240506-151117.json" to get report.
    
    Get roofline report ...
    The roofline json report: /matrix_multiplication/base_matmult_2048.json
    The roofline html report: /matrix_multiplication/base_matmult_2048.html
    
  3. 查看base_matmult_2048_html报告。
    图2 base_matmult_2048_html报告

    此时获取的roofs的并行度为1(即串行),获取到Elapsed Time 62.699s,GFLOP Count 17.18,Performance 0.274 GFLOPS。

    根据Roofline分析,由于物理内核是128个,而并行线程只有1个,因此可以增加并行数来实现调优。