Using Roofline Analysis
Command Function
Helps pinpoint application bottlenecks on a given hardware platform and optimize the application accordingly.
- Only Kunpeng 920 servers are supported. Container environments are not supported.
- Roofline uses the DBI technology and the analyzed application must be a binary file in ELF format.
- Before collecting roofline data, ensure that the application running has been completed. The roofline collection duration is approximately three times the application running duration.
Syntax
devkit tuner roofline [-h] [-l {0,1,2,3}] [-m {total,region}] [-o <file>] workload ...
Parameter Description
|
Parameter |
Option |
Description |
|---|---|---|
|
-h/--help |
- |
Obtains help information. |
|
-l/--log-level |
0/1/2/3 |
Log level, which defaults to 2(warning).
|
|
-m/--mode |
total/region |
Analysis scope, which is the whole binary application or the regions selected by users. The default value is total.
|
|
-o/--outpath |
- |
Name of the generated data package. By default, the file is generated in the current directory. The default file name is roofline-YYYYMMDD-HMS. |
Example
Collect the data of an application that has been divided into regions.
devkit tuner roofline -m region /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c
Command output:
Note: 1. Roofline task is currently only supported on the 920 platform. 2. The application must be a binary file in ELF format. 3. Roofline task collection needs to ensure the application has finished running. 4. The estimated time of roofline collection is about 3 * application estimated time. RFCOLLECT: Start collection for /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c RFCOLLECT: Launch application to collect performance metrics of /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c ROOFLINE_EVENTS are initialized. Initialization time: 0.070167 seconds Calculation time: 0.206211 seconds The dimension of the matrices is too large to print. RFCOLLECT: Launch application to do binary instrumentation of /mysharedir/devkit/SystemProfilerBackend/tuner_cli/docs/matrix_multiply_c Initialization time: 0.168616 seconds Calculation time: 2.243492 seconds The dimension of the matrices is too large to print. RFCOLLECT: Launch benchmarks for measuring roofs RFCOLLECT: Processing all collected data RFCOLLECT: Result is captured at /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/rfcollect-20240424-143840.json RFCOLLECT: Run "rfreport /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/rfcollect-20240424-143840.json" to get report. Get roofline report ... The roofline json report: /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/roofline-20240424-143840.json The roofline html report: /mysharedir/devkit/SystemProfilerBackend/sys_perf/components/sys_tools/roofline-20240424-143840.html
The task generates a JSON file and an HTML file. To analyze data, you can directly use the JSON file. To view performance data, you can open the HTML file in a browser.