Using Roofline Analysis
The roofline model is a throughput-oriented performance model and is widely used in the HPC field. The "roofline" concept indicates that the performance of an application cannot exceed the server hardware capability. Each function and loop in the program are limited by the server hardware. Based on the roofline analysis result, you can quickly locate the performance bottlenecks and obtain tuning methods.
Command Function
Helps pinpoint application bottlenecks on a given hardware platform and optimize the application accordingly.
- Only physical machines of the Kunpeng platform are supported.
- Roofline uses the DBI technology and the analyzed application must be a binary file in ELF format.
- Before collecting roofline data, ensure that the application running has been completed. The roofline collection duration is approximately three times the application running duration.
Syntax
1
|
devkit tuner roofline [-h] [-l {0,1,2,3}] [-m {total,region}] [-o <file>] [--hbm-mode {cache,flat}] workload ... |
workload indicates the application to be analyzed. If information about multiple regions needs to be collected, instrumentation must be performed for the application. For details about instrumentation, see Roofline Instrumentation Guide.
Parameter Description
|
Parameter |
Option |
Description |
|---|---|---|
|
-h/--help |
- |
Obtains help information. This parameter is optional. |
|
-l/--log-level |
0/1/2/3 |
Log level, which defaults to 2. This parameter is optional.
NOTE:
The default level is 2 (WARNING).
|
|
-m/--mode |
total/region |
Analysis scope, which is the whole binary application or the regions selected by users. The default value is total. This parameter is optional.
|
|
-o/--outpath |
- |
Name of the generated data package file. This parameter is optional. By default, the file is generated in the current directory. The default file name is roofline-YYYYMMDD-HMS. |
|
--hbm-mode |
cache |
HBM data collection mode. This parameter is optional. Only the cache mode is supported, in which L1, L2, HBM, and DDR data can be collected. If the environment does not support HBM, HBM data will not be collected. |
Example
Collect the roofline data for the entire application.
1
|
devkit tuner roofline -m total --hbm-mode cache /devkit/testdemo/matrix_multiply_c |
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Note: 1. Roofline task is currently only supported on the 920 platform. 2. The application must be a binary file in ELF format, and read permissions are required to detect the format of the application. 3. Roofline task collection needs to ensure the application has finished running. 4. The estimated time of roofline collection is about 3 * application estimated time. 5. Roofline analysis is available only on physical machines. 6. You can learn about the roofline profiling method by looking at document /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/tuner/docs/ROOFLINE_KNOW_HOW.MD RFCOLLECT: Start collection for /devkit/testdemo/matrix_multiply_c RFCOLLECT: Launch application to collect performance metrics of /devkit/testdemo/matrix_multiply_c Initialization time: 0.085910 seconds Calculation time: 0.371136 seconds The dimension of the matrices is too large to print. RFCOLLECT: Launch application to do binary instrumentation of /devkit/testdemo/matrix_multiply_c Initialization time: 0.153196 seconds Calculation time: 22.620041 seconds The dimension of the matrices is too large to print. RFCOLLECT: Launch benchmarks for measuring roofs RFCOLLECT: Processing all collected data RFCOLLECT: Result is captured at /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/rfcollect-20241213-222445.json RFCOLLECT: Run "rfreport /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/rfcollect-20241213-222445.json" to get report. Get roofline report ... The roofline json report: /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/roofline-20241213-222445.json The roofline html report: /devkit/testdemo/DevKit-CLI-xx.xx.xx-Linux-Kunpeng/roofline-20241213-222445.html |
The task generates a JSON file and an HTML file. To analyze data, you can directly use the JSON file. To view performance data, you can open the HTML file in a browser. When viewing the HTML file, you can click
in the upper right corner to configure what data to display, click
to change the language, and select a region or all content from the Region drop-down list on the top of the page.