串行示例
运行最基本的串行示例,由于串行示例耗时较长,选取行列大小均为2048的矩阵进行任务分析。
该部分代码通过串行方式实现矩阵相乘。
图1 串行示例代码

- 运行矩阵行列大小为2048的base_matmult的示例。
1
./matmul 2048 0
返回信息如下:
1 2 3
Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0 Initialization time = 0.174492s Matrix multiplication time = 62.657254s
矩阵行列大小为2048情况下,串行计算耗时62秒左右。
- 创建矩阵行列大小为2048的base_matmult的Roofline任务。
使用命令行工具进行roofline任务分析。
1
devkit tuner roofline -o base_matmult_2048 -m region ./matmul 2048 0
示例中均使用鲲鹏DevKit命令行模式,也可使用Web模式的Roofline任务进行分析。
返回信息如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Note: 1. Roofline task is currently only supported on the 920 platform. 2. The application must be a binary file in ELF format. 3. Roofline task collection needs to ensure the application has finished running. 4. The estimated time of roofline collection is about 3 * application estimated time. RFCOLLECT: Start collection for ./matmul RFCOLLECT: Launch application to collect performance metrics of ./matmul Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0 Initialization time = 0.174628s ROOFLINE_EVENTS are initialized. Matrix multiplication time = 62.718666s RFCOLLECT: Launch application to do binary instrumentation of ./matmul Size is 2048, Matrix multiplication method is: 0, Check correctness is: 0 Initialization time = 0.528283s Matrix multiplication time = 85.328236s RFCOLLECT: Launch benchmarks for measuring roofs RFCOLLECT: Processing all collected data RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240506-151117.json RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240506-151117.json" to get report. Get roofline report ... The roofline json report: /matrix_multiplication/base_matmult_2048.json The roofline html report: /matrix_multiplication/base_matmult_2048.html
- 查看base_matmult_2048_html报告。图2 base_matmult_2048_html报告
此时获取的roofs的并行度为1(即串行),获取到Elapsed Time 62.699s,GFLOP Count 17.18,Performance 0.274 GFLOPS。
根据Roofline分析,由于物理内核是128个,而并行线程只有1个,因此可以增加并行数来实现调优。
父主题: 使用Roofline进行性能分析