Rate This Document
Findability
Accuracy
Completeness
Readability

Using Roofline Analysis

Command Function

Helps pinpoint application bottlenecks on a given hardware platform and optimize the application accordingly.

  • Only Kunpeng 920 servers are supported. Container environments are not supported.
  • Roofline uses the DBI technology and the analyzed application must be a binary file in ELF format.
  • Before collecting roofline data, ensure that the application running has been completed. The roofline collection duration is approximately three times the application running duration.

Syntax

devkit tuner roofline [-h] [-l {0,1,2,3}] [-m {total,region}] [-o <file>] workload ...

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

-

Obtains help information.

-l/--log-level

0/1/2/3

Log level, which defaults to 2(warning).

  • 0(debug)
  • 1(info)
  • 2(warning)
  • 3(error)

-m/--mode

total/region

Analysis scope, which is the whole binary application or the regions selected by users. The default value is total.

  • total: The roofline data of the whole application is collected.
  • region: The roofline data of each region in the application is collected. You need to perform instrumentation for region division.

-o/--outpath

-

Name of the generated data package. By default, the file is generated in the current directory. The default file name is roofline-YYYYMMDD-HMS.

Example

Collect the data of an application that has been divided into regions.

devkit tuner roofline -m region /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c

Command output:

Note:
  1. Roofline task is currently only supported on the 920 platform.
  2. The application must be a binary file in ELF format.
  3. Roofline task collection needs to ensure the application has finished running.
  4. The estimated time of roofline collection is about 3 * application estimated time.
RFCOLLECT: Start collection for /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
RFCOLLECT: Launch application to collect performance metrics of /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
ROOFLINE_EVENTS are initialized.
Initialization time: 0.070167 seconds
Calculation time: 0.206211 seconds
The dimension of the matrices is too large to print.
RFCOLLECT: Launch application to do binary instrumentation of /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
Initialization time: 0.168616 seconds
Calculation time: 2.243492 seconds
The dimension of the matrices is too large to print.
RFCOLLECT: Launch benchmarks for measuring roofs
RFCOLLECT: Processing all collected data
RFCOLLECT: Result is captured at /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/rfcollect-20240424-143840.json
RFCOLLECT: Run "rfreport /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/rfcollect-20240424-143840.json" to get report.
Get roofline report ...
The roofline json report: /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/roofline-20240424-143840.json
The roofline html report: /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/roofline-20240424-143840.html

The task generates a JSON file and an HTML file. To analyze data, you can directly use the JSON file. To view performance data, you can open the HTML file in a browser.

Figure 1 Roofline HTML file