Rate This Document
Findability
Accuracy
Completeness
Readability

Using Roofline Analysis

Command Function

Helps pinpoint application bottlenecks on a given hardware platform and optimize the application accordingly.

  • Only physical machines of the Kunpeng platform are supported.
  • Roofline uses the DBI technology and the analyzed application must be a binary file in ELF format.
  • Before collecting roofline data, ensure that the application running has been completed. The roofline collection duration is approximately three times the application running duration.

Syntax

devkit tuner roofline [-h] [-l {0,1,2,3}] [-m {total,region}] [-o <file>] workload ...

workload indicates the application to be analyzed. If information about multiple regions needs to be collected, instrumentation must have been performed for the application. For details about instrumentation, see Roofline Instrumentation Guide.

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

-

Obtains help information.

-l/--log-level

0/1/2/3

Log level, which defaults to 2.
  • 0: DEBUG
  • 1: INFO
  • 2: WARNING
  • 3: ERROR

-m/--mode

total/region

Analysis scope, which is the whole binary application or the regions selected by users. The default value is total.

  • total: The roofline data of the whole application is collected.
  • region: The roofline data of each region in the application is collected. You need to perform instrumentation for region division.

-o/--outpath

-

Name of the generated data package. By default, the file is generated in the current directory. The default file name is roofline-YYYYMMDD-HMS.

Example

Collect the data of an application that has been divided into regions.

devkit tuner roofline -m region /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c

Command output:

Note:
  1. Roofline task is currently only supported on the 920 platform.
  2. The application must be a binary file in ELF format.
  3. Roofline task collection needs to ensure the application has finished running.
  4. The estimated time of roofline collection is about 3 * application estimated time.
RFCOLLECT: Start collection for /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
RFCOLLECT: Launch application to collect performance metrics of /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
ROOFLINE_EVENTS are initialized.
Initialization time: 0.070167 seconds
Calculation time: 0.206211 seconds
The dimension of the matrices is too large to print.
RFCOLLECT: Launch application to do binary instrumentation of /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
Initialization time: 0.168616 seconds
Calculation time: 2.243492 seconds
The dimension of the matrices is too large to print.
RFCOLLECT: Launch benchmarks for measuring roofs
RFCOLLECT: Processing all collected data
RFCOLLECT: Result is captured at /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/rfcollect-20240424-143840.json
RFCOLLECT: Run "rfreport /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/rfcollect-20240424-143840.json" to get report.
Get roofline report ...
The roofline json report: /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/roofline-20240424-143840.json
The roofline html report: /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/roofline-20240424-143840.html

The task generates a JSON file and an HTML file. To analyze data, you can directly use the JSON file. To view performance data, you can open the HTML file in a browser.

Figure 1 Roofline HTML file