Rate This Document
Findability
Accuracy
Completeness
Readability

Hotspot Function Analysis

The tool collects hotspot functions and allows customizing the collection mode and events. You can check the call relationship between hotspot functions and the associated code lines to locate faults. Then you can tune code properly to improve the program performance.

Command Function

Analyzes C/C++ program code, identifies performance bottlenecks, and provides details about the top hotspot functions and call stacks. The tool provides flame graphs to visualize function call relationships. It collects statistics on the average, maximum, and minimum CPU frequencies during the hotspot function sampling period to identify optimization paths.

Syntax

1
devkit tuner hotspot [-h] [-c {0 | 0,1,2 | 0-2}] [-r {user, kernel, all}] [-d <sec>] [-D <sec>] [-t n] [-f n] [-l {0, 1, 2, 3}] [-i <sec>] [-e] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [-g] [--package] [--long-name] [--dwarf] [workload workload...]

The tool can collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

-

Obtains help information. This parameter is optional.

-c/--cpu

-

Numbers of CPU cores to be collected, for example, 0, 0,1,2, and 0-2. This parameter is optional.

-r/--collection-range

user/kernel/all

Collection mode, which defaults to all. This parameter is optional.

  • all: collects user-mode and kernel-mode performance data.
  • user: collects user-mode performance data.
  • kernel: collects kernel-mode performance data.

-d/--duration

-

Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis. This parameter is optional.

-D/--delay

-

Collection delay, which defaults to 0, in seconds, and must be less than the collection duration. This parameter is optional.

-t/--top

-

Number of data records to be displayed in the report. The minimum value is 1. This parameter is optional.

-f/--frequency

-

Sampling frequency, which defaults to 200 times per second. The minimum value is 1 time per second. This parameter is optional.

-l/--log-level

0/1/2/3

Log level, which defaults to 1. This parameter is optional.
  • 0: DEBUG
  • 1: INFO
  • 2: WARNING
  • 3: ERROR

-i/--interval

-

Collection interval, in seconds. This parameter is optional. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport.

-e/--event

-

Events to be collected. You can run the devkit tuner hotspot list command to see what events can be collected. This parameter is optional.

-o/--output

-

Report package name and output path (no package name extension required). If you enter a name only, the report package is generated in the current directory by default. This option must be used together with --package. This parameter is optional.

-s/--src-dir

-

Source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display. This parameter is optional.

-p/--pid

-

ID of a process to be collected. Separate multiple PIDs with commas (,). This parameter is optional. By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected.

-g

-

Displays call stack information. This parameter is optional. If the -g option is enabled, an HTML flame graph file is generated. By default, a Flamegraph-Timestamp.html file is generated in the current directory.

--package

-

Indicates whether to generate a report data package. If you do not set the package name or path, the hotspot-Timestamp.tar package is generated in the current directory by default. This parameter is optional.

--long-name

-

Indicates whether to display detailed function and module information. This parameter is optional. If this parameter is not set, the module or function information is displayed in a simple manner by default.

--dwarf

-

Indicates whether to display the associated source file. This parameter is optional.

Example

1
devkit tuner hotspot -c 110-112 -d 10 -r user -g

The -c 110-112 parameter indicates that CPU cores 110 to 112 are collected, -d 10 indicates that data is collected for 10 seconds, -r user indicates that user-mode performance data is collected, and -g indicates that the call stack information is displayed and a flame graph HTML file is generated. By default, the file is generated in the current directory /home/hotspot.

Command output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Hotspot Summary Report-ALL                              Time:2024/06/05 17:24:11
================================================================================
1.The current server supports a maximum of 2600 MHz CPU frequency and a minimum of 200 MHz.
2.Only the collected CPU frequency information about the running process is displayed. Considering the large number of cores, only the CPU frequency with a difference of more than 1% from the maximum CPU frequency is displayed.
For example, the collected process runs on CPUs 0-10, and only the difference between the CPU frequency of CPUs 3-6 and the maximum frequency supported by the current server exceeds 1%. In this case, CPU 3-6 is displayed.
─────────────────────────────────────────────────────────────────────────────
  Core       Avg Frequency(MHz)    Max Frequency(MHz)    Min Frequency(MHz)
─────────────────────────────────────────────────────────────────────────────
  CPU110                   1135                  1211                   896
─────────────────────────────────────────────────────────────────────────────
─────────────────────────────────────────────────────────────────────────────────
  Function                                  cycles    Module                         cycles(%)
─────────────────────────────────────────────────────────────────────────────────
  Hotspot::Mon***riable(int)            28,295,946    libtuner.so                        13.19
  KUNPENG_SYM:***har const*)            23,937,102    libsym.so                          11.16
  _pthread_cle***_push_defer            22,824,809    libpthread-2.28.so                 10.64
  el0_svc_common                        20,437,520    [kernel]                            9.53
  0x79b84                               17,734,958    libc-2.28.so                        8.27
  runtime.(*lfstack).push               17,149,776    dockerd                             7.99
  0x78e34                               16,163,183    libc-2.28.so                        7.53
  0xa2a9c                               15,726,617    libglib-2.0.so.0.6600.8             7.33
  _PyType_Lookup                        14,070,497    libpython3.9.so.1.0                 6.56
  std::_Rb_tre***node_base&)            10,375,159    libstdc++.so.6.0.24                 4.84
  __libc_malloc                         10,287,180    libc-2.28.so                        4.79
  0x260ec                                9,941,742    libsym.so                           4.63
  0x25f9c                                6,144,404    libsym.so                           2.86
  runtime.runqgrab                         320,367    dockerd                             0.15
  runtime.greyobject                       187,025    dockerd                             0.09
  runtime.scanobject                       156,426    dockerd                             0.07
  runtime.notesleep                        118,716    dockerd                             0.06
  runtime.findrunnable                      88,611    dockerd                             0.04
  index                                     76,937    libc-2.28.so                        0.04
  g_closure_ref                             69,065    libgobject-2.0.so.0.6600.8          0.03
  0x70c46c                                  62,180    magent                              0.03
  safe_close                                61,249    libsystemd-shared-243.so            0.03
  0x10540                                   60,090    auditd                              0.03
  fjson_object_put                          59,935    libfastjson.so.4.3.0                0.03
  0x5be40                                   51,490    python3.9                           0.02
  0x11850                                   49,759    libsystemd.so.0.27.0                0.02
  0x6a1c4                                   43,487    libpython3.9.so.1.0                 0.02
  _PyEval_EvalFrameDefault                  38,232    python3.9                           0.02
  PyBuffer_Release                          12,399    python3.9                           0.01
─────────────────────────────────────────────────────────────────────────────────
5080 milliseconds time elapsed
If *** is displayed in Function, use --long-name to show full function name.
Callstack is saved to /home/hotspot/callstack-20240607-141541.log
Flamegraph is saved to /home/hotspot/Flamegraph-20240607-141541.html

By default, the flame graph HTML file (Flamegraph-20240607-141541.html) is generated in the current directory. You can view the file using your browser.

Figure 1 Flame graph HTML file
  • The flame graph is described as follows:

    The Y axis indicates the call stack. Each layer represents a function. A deeper stack indicates a higher flame. The top is the function being executed, and its parent functions are below it.

    The X axis indicates the number of samples. A wider function indicates longer execution time of the function. Note that the X axis does not represent time. Instead, it represents all call stacks arranged in alphabetical order.

  • Function call relationship and execution time can be displayed in a flame graph, helping to find the hotspot functions and their tuning paths. The hot flame graph indicates the CPU usage and is used to locate high CPU usage. Generally, warm colors are used.
  • You can hover the cursor over a function in the flame graph to view details.
  • You can search for functions in the search box. You can confirm Case sensitive to initiate case-sensitive query. This option is not selected by default. After the query, the flame graph automatically adds a background color to the query result and a border to the first function.