Rate This Document
Findability
Accuracy
Completeness
Readability

Using Roofline Analysis

The roofline model is a throughput-oriented performance model and is widely used in the HPC field. The "roofline" concept indicates that the performance of an application cannot exceed the server hardware capability. Each function and loop in the program are limited by the server hardware. Based on the roofline analysis result, you can quickly locate the performance bottlenecks and obtain tuning methods.

Command Function

Helps pinpoint application bottlenecks on a given hardware platform and optimize the application accordingly.

  • Only physical machines of the Kunpeng platform are supported.
  • Roofline uses the DBI technology and the analyzed application must be a binary file in ELF format.
  • Before collecting roofline data, ensure that the application running has been completed. The roofline collection duration is approximately three times the application running duration.

Syntax

1
devkit tuner roofline [-h] [-l {0,1,2,3}] [-m {total,region}] [-o <file>] [--hbm-mode {cache,flat}] workload ...

workload indicates the application to be analyzed. If information about multiple regions needs to be collected, instrumentation must be performed for the application. For details about instrumentation, see Roofline Instrumentation Guide.

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

-

Obtains help information. This parameter is optional.

-l/--log-level

0/1/2/3

Log level, which defaults to 2. This parameter is optional.
NOTE:

The default level is 2 (WARNING).

  • 0: DEBUG
  • 1: INFO
  • 2: WARNING
  • 3: ERROR

-m/--mode

total/region

Analysis scope, which is the whole binary application or the regions selected by users. The default value is total. This parameter is optional.

  • total: The roofline data of the whole application is collected.
  • region: The roofline data of each region in the application is collected. You need to perform instrumentation for region division.

-o/--outpath

-

Name of the generated data package file. This parameter is optional. By default, the file is generated in the current directory. The default file name is roofline-YYYYMMDD-HMS.

--hbm-mode

cache

HBM data collection mode. This parameter is optional. Only the cache mode is supported, in which L1, L2, HBM, and DDR data can be collected. If the environment does not support HBM, HBM data will not be collected.

Example

Collect the roofline data for the entire application.

1
devkit tuner roofline -m total --hbm-mode cache /devkit/testdemo/matrix_multiply_c

Command output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Note:
    1. Roofline task is currently only supported on the 920 platform.
    2. The application must be a binary file in ELF format, and read permissions are required to detect the format of the application.
    3. Roofline task collection needs to ensure the application has finished running.
    4. The estimated time of roofline collection is about 3 * application estimated time.
    5. Roofline analysis is available only on physical machines.
    6. You can learn about the roofline profiling method by looking at document /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/tuner/docs/ROOFLINE_KNOW_HOW.MD
RFCOLLECT: Start collection for /devkit/testdemo/matrix_multiply_c
RFCOLLECT: Launch application to collect performance metrics of /devkit/testdemo/matrix_multiply_c
Initialization time: 0.085910 seconds
Calculation time: 0.371136 seconds
The dimension of the matrices is too large to print.
RFCOLLECT: Launch application to do binary instrumentation of /devkit/testdemo/matrix_multiply_c
Initialization time: 0.153196 seconds
Calculation time: 22.620041 seconds
The dimension of the matrices is too large to print.
RFCOLLECT: Launch benchmarks for measuring roofs
RFCOLLECT: Processing all collected data
RFCOLLECT: Result is captured at /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/rfcollect-20241213-222445.json
RFCOLLECT: Run "rfreport /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/rfcollect-20241213-222445.json" to get report.

Get roofline report ...
The roofline json report: /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/roofline-20241213-222445.json
The roofline html report: /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/roofline-20241213-222445.html

The task generates a JSON file and an HTML file. To analyze data, you can directly use the JSON file. To view performance data, you can open the HTML file in a browser. When viewing the HTML file, you can click in the upper right corner to configure what data to display, click to change the language, and select a region or all content from the Region drop-down list on the top of the page.

Figure 1 Roofline HTML file