Hotspot Function Analysis
The tool collects hotspot functions and allows customizing the collection mode and events. You can check the call relationship between hotspot functions and the associated code lines to locate faults. Then you can tune code properly to improve the program performance.
Command Function
Analyzes C/C++ program code, identifies performance bottlenecks, and provides details about the top hotspot functions and call stacks. The tool also displays the function call relationship in flame graphs and provides the tuning path.
Syntax
1
|
devkit tuner hotspot [-h] [-c {n | n,m | n-m}] [-d <sec>] [-D <sec>] [-t n] [-f n] [-l {0, 1, 2, 3}] [-i <sec>] [-r {user, kernel, all}] [-e] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [-g] [--package] [--long-name] [--dwarf] [workload workload...] |
The tool can collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.
Parameter Description
|
Parameter |
Option |
Description |
|---|---|---|
|
-h/--help |
- |
Obtains help information. This parameter is optional. |
|
-c/--cpu |
- |
Numbers of CPU cores to be collected, for example, 0, 0,1,2, and 0-2. This parameter is optional. |
|
-d/--duration |
- |
Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis. This parameter is optional. |
|
-D/--delay |
- |
Collection delay, which defaults to 0, in seconds, and must be less than the collection duration. This parameter is optional. |
|
-i/--interval |
- |
Collection interval, in seconds. This parameter is optional. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport. |
|
-l/--log-level |
0/1/2/3 |
Log level, which defaults to 1. This parameter is optional.
|
|
-f/--frequency |
- |
Sampling frequency, which defaults to 200 times per second. The minimum value is 1 time per second. This parameter is optional. |
|
-e/--event |
- |
Events to be collected. You can run the devkit tuner hotspot list command to see what events can be collected. This parameter is optional. |
|
-o/--output |
- |
Report package name and output path (no package name extension required). If you enter a name only, the report package is generated in the current directory by default. This option must be used together with --package. This parameter is optional. |
|
-r/--collection-range |
user/kernel/all |
Collection mode, which defaults to all. This parameter is optional.
|
|
-s/--src-dir |
- |
Source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display. This parameter is optional. |
|
-g |
- |
Displays call stack information. This parameter is optional. If the -g option is enabled, a flame graph HTML file is generated in the user directory by default. |
|
-p/--pid |
PID/PID1,PID2/ALL |
ID of a process to be collected. Separate multiple PIDs with commas (,). This parameter is optional. By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected. |
|
--long-name |
- |
Indicates whether to display detailed function and module information. This parameter is optional. If this parameter is not set, the module or function information is displayed in a simple manner by default. |
|
--dwarf |
- |
Indicates whether to display the associated source file. This parameter is optional. |
|
-t/--top |
- |
Number of data records to be displayed in the report. The minimum value is 1. This parameter is optional. |
|
--package |
- |
Indicates whether to generate a report data package. If you do not set the package name or path, the hotspot-timestamp.tar package is generated in the current directory by default. This parameter is optional. |
Example
1
|
devkit tuner hotspot -c 0-127 -d 3 -i 1 -o /home/hotspot_cpu -g --package --long-name |
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
Hotspot Summary Report-1 Time:2024/07/19 10:12:32 ================================================================================ ──────────────────────────────────────────────────────────────────── Function cycles Module cycles(%) ──────────────────────────────────────────────────────────────────── __do_softirq 108,999,839 [kernel] 57.88 arch_cpu_idle 55,335,310 [kernel] 29.38 avc_lookup 8,693,198 [kernel] 4.62 0xfd950 3,706,419 /home/devkit/libsqlite3/libsqlite3.so.0.8.6 1.97 dput 3,706,419 [kernel] 1.97 __set_current_blocked 3,041,886 [kernel] 1.62 smp_call_function_single 2,763,855 [kernel] 1.47 __clock_gettime 1,135,231 /usr/lib64/libc.so.6 0.60 0x7eab4 879,665 /usr/lib64/libc.so.6 0.47 generic_exec_single 67,298 [kernel] 0.04 ──────────────────────────────────────────────────────────────────── Hotspot Summary Report-2 Time:2024/07/19 10:12:33 ================================================================================ ──────────────────────────────────────────────────────────────────── Function cycles Module cycles(%) ───────────────────────────────────────────────────────────────────── std::pair<std::_Rb_tree_iterator<std::pair<unsigned long cons 81,259,412 /root/DevKit-CLI-24.0.xx-Linux- Kunpeng/tuner/lib/libsym.so 14.11 t, elf::sym> >, bool> std::_Rb_tree<unsigned long, std::pair< unsigned long const, elf::sym>, std::_Select1st<std::pair<uns igned long const, elf::sym> >, std::less<unsigned long>, std: :allocator<std::pair<unsigned long const, elf::sym> > >::_M_i nsert_unique<std::pair<unsigned long const, elf::sym> const&> (std::pair<unsigned long const, elf::sym> const&) malloc 76,662,049 /usr/lib64/libc.so.6 13.32 KUNPENG_SYM::SymbolResolve::RecordElf(char const*) 38,279,588 /root/DevKit-CLI-24.0.xx-Linux- Kunpeng/tuner/lib/libsym.so 6.65 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 25,561,443 /root/DevKit-CLI-24.0.xx-Linux- Kunpeng/tuner/libtuner.so 4.44 () ... ... ... rt6_probe 735,855 [kernel] 0.13 flush_smp_call_function_from_idle 530,160 [kernel] 0.09 G1YoungRemSetSamplingClosure::do_heap_region(HeapRegion*) 406,124 /home/bisheng-jdk17/lib/server/libjvm.so 0.07 ─────────────────────────────────────────────────────────────────────Hotspot Summary Report-3 Time:2024/07/19 10:12:34 ================================================================================ ──────────────────────────────────────────────────────────────────── Function cycles Module cycles(%) ───────────────────────────────────────────────────────────────────────────────────── 0x8e09c14 132,887,377 /home/bisheng-jdk17/lib/libzip.so 21.48 std::pair<std::_Rb_tree_iterator<std::pair<unsigned long cons 57,971,051 /root/DevKit-CLI-24.0.xx-Linux-Kunpeng/tuner/lib/libsym.so 9.37 t, elf::sym> >, bool> std::_Rb_tree<unsigned long, std::pair< unsigned long const, elf::sym>, std::_Select1st<std::pair<uns igned long const, elf::sym> >, std::less<unsigned long>, std: :allocator<std::pair<unsigned long const, elf::sym> > >::_M_i nsert_unique<std::pair<unsigned long const, elf::sym> const&> (std::pair<unsigned long const, elf::sym> const&) 0x8e09a4c 33,494,056 /home/bisheng-jdk17/lib/libzip.so 5.41 0x8e09a84 31,358,880 /home/bisheng-jdk17/lib/libzip.so 5.07 arch_cpu_idle 21,190,896 [kernel] 3.43 ... ... ... 0xffff800008f78d80 781,761 [kernel] 0.13 ldsem_down_read_trylock 738,684 [kernel] 0.12 ───────────────────────────────────────────────────────────────────────────────────── Hotspot Summary Report-ALL Time:2024/07/19 10:12:32 ================================================================================ ───────────────────────────────────────────────────────────────────────────────────── Function cycles Module cycles(%) ───────────────────────────────────────────────────────────────────────────────────── std::pair<std::_Rb_tree_iterator<std::pair<unsigned long cons 139,230,463 /root/DevKit-CLI-24.0.xx-Linux-Kunpeng/tuner/lib/libsym.so 10.07 t, elf::sym> >, bool> std::_Rb_tree<unsigned long, std::pair< unsigned long const, elf::sym>, std::_Select1st<std::pair<uns igned long const, elf::sym> >, std::less<unsigned long>, std: :allocator<std::pair<unsigned long const, elf::sym> > >::_M_i nsert_unique<std::pair<unsigned long const, elf::sym> const&> (std::pair<unsigned long const, elf::sym> const&) ... ... ... G1YoungRemSetSamplingClosure::do_heap_region(HeapRegion*) 406,124 /home/bisheng-jdk17/lib/server/libjvm.so 0.03 ───────────────────────────────────────────────────────────────────────────────────── 3348 milliseconds time elapsed Callstack is saved to /home/callstack-20240719-101232.log Flamegraph is saved to /home/Flamegraph-20240719-101232.html The report /home/hotspot_cpu1.tar is generated successfully. To view summary report. you can run: devkit report -i /home/hotspot_cpu.tar To view detail report. you can import the report to the WebUI or IDE to view details. |
By default, the flame graph HTML file is generated in the user directory. You can view the file using your browser.
- The flame graph is described as follows:
The Y axis indicates the call stack. Each layer represents a function. A deeper stack indicates a higher flame. The top is the function being executed, and its parent functions are below it.
The X axis indicates the number of samples. A wider function indicates longer execution time of the function. Note that the X axis does not represent time. Instead, it represents all call stacks arranged in alphabetical order.
- Function call relationship and execution time can be displayed in a flame graph, helping to find the hotspot functions and their tuning paths. The hot flame graph indicates the CPU usage and is used to locate high CPU usage. Generally, warm colors are used.
- You can hover the cursor over a function in the flame graph to view details.
- You can search for functions in the search box. You can confirm Case sensitive to initiate case-sensitive query. This option is not selected by default. After the query, the flame graph automatically adds a background color to the query result and a border to the first function.