System Profiler Functions
The System Profiler is a performance analysis tool used on the Kunpeng platform. It collects performance data of processor hardware, operating system (OS), processes/threads, and functions, analyzes system performance metrics, locates system bottlenecks and hotspot functions, and provides tuning suggestions.
This tool supports three types of installation packages: the standalone System Profiler packages (.tar.gz and .rpm) and the full DevKit package.
- The standalone System Profiler package is intended for a lightweight Tuner. If you only need the System Profiler functionality, you can choose the standalone installation to save system resources.
- The full DevKit tool package not only contains the System Profiler, but also provides functions such as porting, affinity analysis, and Python/C profiling. If you need to use other functions, choose the full installation package.
Prerequisites
- The System Profiler has been installed. See Installing the Tool.
- If you have installed the tool using a standalone or full package, extract the package and switch to the tool directory. Then run the command in ./ mode, for example, ./devkit tuner -h. If you have installed the tool using an RPM package, run devkit tuner -h. This section uses an RPM package as an example.
Command Function
Displays the help information about system performance analysis tasks.
Syntax
1 | devkit tuner [-h | --help] TASK [ARGS] |
Example
Run the following command to view the information about the functions supported by the System Profiler:
1 | devkit tuner -h |
Command output:
Usage: devkit tuner [-h | --help] TASK [ARGS] The most commonly used devkit tuner sub tasks are: help Get help information top-down Run the top-down collection and analysis task hotspot Run the hotspot collection and analysis task miss Run the miss collection and analysis task numafast Run the numafast collection and analysis task hpc-perf Run the hpc-perf collection and analysis task roofline Run the roofline collection and analysis task memory Run the memory collection and analysis task turbostat Run the turbostat collection and analysis task See 'devkit tuner TASK --help' for more information on a specific task.
Subcommand |
Function |
Description |
Supported Platform |
|---|---|---|---|
top-down |
Microarchitecture analysis |
Based on Arm performance monitor unit (PMU) events, you can learn the running status of instructions on the CPU pipeline. You can modify your application accordingly to make full use of your hardware resources. |
Kunpeng |
hotspot |
Hotspot function analysis |
The tool collects hotspot functions and allows customizing the collection mode and events. You can check the call relationship between hotspot functions and the associated code lines to locate faults. Then you can tune code properly to improve the program performance. |
Kunpeng |
miss |
Miss event analysis |
When accessing data, a CPU searches for the cache level by level. If the target data is not in the cache, a cache miss occurs (the performance deteriorates severely when many cache misses occur). The command analyzes miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load, helping you modify your program properly to improve the program performance. |
Kunpeng |
memory |
Memory access statistics analysis |
A memory access unit is the most complex logic control unit in the CPU. This unit is responsible for handling various problems in a process of executing memory access instructions such as Load and Store and ensuring high-speed execution. With memory access statistics analysis, you can find those processes that may cause performance problems. |
|
numafast |
NUMA refined analysis |
By analyzing DDR access data, inter-NUMA access traffic matrix, and other data, you can find the bandwidth traffic or threads/processes that may have problems, and further locate performance problems caused by cross-CPU memory access. |
|
hpc-perf |
HPC application analysis |
HPC is a technology that leverages powerful processor clusters to process massive amounts of multi-dimensional datasets (also called big data) in parallel mode and solve complex problems at high speeds. This command provides multiple task modes to collect and analyze key metrics of HPC applications in scenarios with different resource overheads. It also provides tuning suggestions to help improve application performance. |
|
roofline |
Roofline analysis |
The roofline model is a throughput-oriented performance model and is widely used in the HPC field. The "roofline" concept indicates that the performance of an application cannot exceed the server hardware capability. Each function and loop in the program are limited by the server hardware. Based on the roofline analysis result, you can quickly locate the performance bottlenecks and obtain tuning methods. |
|
turbostat |
Frequency and power consumption analysis |
The tool obtains the server's CPU frequency, temperature, and power consumption information based on the hardware driver and BMC information, helping locate service performance bottlenecks and make full use of hardware resources. |