Hotspot Function Analysis
The tool collects hotspot functions and allows customizing the collection mode and events. You can check the call relationship between hotspot functions and the associated code lines to locate faults. Then you can tune code properly to improve the program performance.
Command Function
Analyzes C/C++ program code, identifies performance bottlenecks, and provides details about the top hotspot functions and call stacks. The tool provides flame graphs to visualize function call relationships. It collects statistics on the average, maximum, and minimum CPU frequencies during the hotspot function sampling period to identify optimization paths.
Syntax
1
|
devkit tuner hotspot [-h] [-c {0 | 0,1,2 | 0-2}] [-r {user, kernel, all}] [-d <sec>] [-D <sec>] [-t n] [-f n] [-l {0, 1, 2, 3}] [-i <sec>] [-e] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [-g] [--package] [--long-name] [--dwarf] [workload workload...] |
The tool can collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.
Parameter Description
|
Parameter |
Option |
Description |
|---|---|---|
|
-h/--help |
- |
Obtains help information. This parameter is optional. |
|
-c/--cpu |
- |
Numbers of CPU cores to be collected, for example, 0, 0,1,2, and 0-2. This parameter is optional. |
|
-r/--collection-range |
user/kernel/all |
Collection mode, which defaults to all. This parameter is optional.
|
|
-d/--duration |
- |
Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis. This parameter is optional. |
|
-D/--delay |
- |
Collection delay, which defaults to 0, in seconds, and must be less than the collection duration. This parameter is optional. |
|
-t/--top |
- |
Number of data records to be displayed in the report. The minimum value is 1. This parameter is optional. |
|
-f/--frequency |
- |
Sampling frequency, which defaults to 200 times per second. The minimum value is 1 time per second. This parameter is optional. |
|
-l/--log-level |
0/1/2/3 |
Log level, which defaults to 1. This parameter is optional.
|
|
-i/--interval |
- |
Collection interval, in seconds. This parameter is optional. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport. |
|
-e/--event |
- |
Events to be collected. You can run the devkit tuner hotspot list command to see what events can be collected. This parameter is optional. |
|
-o/--output |
- |
Report package name and output path (no package name extension required). If you enter a name only, the report package is generated in the current directory by default. This option must be used together with --package. This parameter is optional. |
|
-s/--src-dir |
- |
Source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display. This parameter is optional. |
|
-p/--pid |
- |
ID of a process to be collected. Separate multiple PIDs with commas (,). This parameter is optional. By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected. |
|
-g |
- |
Displays call stack information. This parameter is optional. If the -g option is enabled, an HTML flame graph file is generated. By default, a Flamegraph-Timestamp.html file is generated in the current directory. |
|
--package |
- |
Indicates whether to generate a report data package. If you do not set the package name or path, the hotspot-Timestamp.tar package is generated in the current directory by default. This parameter is optional. |
|
--long-name |
- |
Indicates whether to display detailed function and module information. This parameter is optional. If this parameter is not set, the module or function information is displayed in a simple manner by default. |
|
--dwarf |
- |
Indicates whether to display the associated source file. This parameter is optional. |
Example
1
|
devkit tuner hotspot -c 110-112 -d 10 -r user -g |
The -c 110-112 parameter indicates that CPU cores 110 to 112 are collected, -d 10 indicates that data is collected for 10 seconds, -r user indicates that user-mode performance data is collected, and -g indicates that the call stack information is displayed and a flame graph HTML file is generated. By default, the file is generated in the current directory /home/hotspot.
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
Hotspot Summary Report-ALL Time:2024/06/05 17:24:11 ================================================================================ 1.The current server supports a maximum of 2600 MHz CPU frequency and a minimum of 200 MHz. 2.Only the collected CPU frequency information about the running process is displayed. Considering the large number of cores, only the CPU frequency with a difference of more than 1% from the maximum CPU frequency is displayed. For example, the collected process runs on CPUs 0-10, and only the difference between the CPU frequency of CPUs 3-6 and the maximum frequency supported by the current server exceeds 1%. In this case, CPU 3-6 is displayed. ───────────────────────────────────────────────────────────────────────────── Core Avg Frequency(MHz) Max Frequency(MHz) Min Frequency(MHz) ───────────────────────────────────────────────────────────────────────────── CPU110 1135 1211 896 ───────────────────────────────────────────────────────────────────────────── ───────────────────────────────────────────────────────────────────────────────── Function cycles Module cycles(%) ───────────────────────────────────────────────────────────────────────────────── Hotspot::Mon***riable(int) 28,295,946 libtuner.so 13.19 KUNPENG_SYM:***har const*) 23,937,102 libsym.so 11.16 _pthread_cle***_push_defer 22,824,809 libpthread-2.28.so 10.64 el0_svc_common 20,437,520 [kernel] 9.53 0x79b84 17,734,958 libc-2.28.so 8.27 runtime.(*lfstack).push 17,149,776 dockerd 7.99 0x78e34 16,163,183 libc-2.28.so 7.53 0xa2a9c 15,726,617 libglib-2.0.so.0.6600.8 7.33 _PyType_Lookup 14,070,497 libpython3.9.so.1.0 6.56 std::_Rb_tre***node_base&) 10,375,159 libstdc++.so.6.0.24 4.84 __libc_malloc 10,287,180 libc-2.28.so 4.79 0x260ec 9,941,742 libsym.so 4.63 0x25f9c 6,144,404 libsym.so 2.86 runtime.runqgrab 320,367 dockerd 0.15 runtime.greyobject 187,025 dockerd 0.09 runtime.scanobject 156,426 dockerd 0.07 runtime.notesleep 118,716 dockerd 0.06 runtime.findrunnable 88,611 dockerd 0.04 index 76,937 libc-2.28.so 0.04 g_closure_ref 69,065 libgobject-2.0.so.0.6600.8 0.03 0x70c46c 62,180 magent 0.03 safe_close 61,249 libsystemd-shared-243.so 0.03 0x10540 60,090 auditd 0.03 fjson_object_put 59,935 libfastjson.so.4.3.0 0.03 0x5be40 51,490 python3.9 0.02 0x11850 49,759 libsystemd.so.0.27.0 0.02 0x6a1c4 43,487 libpython3.9.so.1.0 0.02 _PyEval_EvalFrameDefault 38,232 python3.9 0.02 PyBuffer_Release 12,399 python3.9 0.01 ───────────────────────────────────────────────────────────────────────────────── 5080 milliseconds time elapsed If *** is displayed in Function, use --long-name to show full function name. Callstack is saved to /home/hotspot/callstack-20240607-141541.log Flamegraph is saved to /home/hotspot/Flamegraph-20240607-141541.html |
By default, the flame graph HTML file (Flamegraph-20240607-141541.html) is generated in the current directory. You can view the file using your browser.
- The flame graph is described as follows:
The Y axis indicates the call stack. Each layer represents a function. A deeper stack indicates a higher flame. The top is the function being executed, and its parent functions are below it.
The X axis indicates the number of samples. A wider function indicates longer execution time of the function. Note that the X axis does not represent time. Instead, it represents all call stacks arranged in alphabetical order.
- Function call relationship and execution time can be displayed in a flame graph, helping to find the hotspot functions and their tuning paths. The hot flame graph indicates the CPU usage and is used to locate high CPU usage. Generally, warm colors are used.
- You can hover the cursor over a function in the flame graph to view details.
- You can search for functions in the search box. You can confirm Case sensitive to initiate case-sensitive query. This option is not selected by default. After the query, the flame graph automatically adds a background color to the query result and a border to the first function.