Miss Event Analysis
When accessing data, a CPU searches for the cache level by level. If the target data is not in the cache, a cache miss occurs (the performance deteriorates severely when many cache misses occur). The miss event analysis function analyzes miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load, helping you modify your program properly to improve the program performance.
Command Function
Uses the Statistical Profiling Extension (
- To collect miss events, the server must support Arm SPE collection. For details about how to configure SPE, see Configuring the SPE Environment.
- Miss event analysis is available on openEuler 20.x or later and openEuler-based OS releases. VM or container environments are not supported.
Syntax
1 | devkit tuner miss [-h] [-c {n | n,m | n-m}] [-d <sec>] [-P n] [-D <sec>] [-t n] [-l {0, 1, 2, 3}] [-m {1, 2, 3, 4}] [-L n] [-i <sec>] [-r {user, kernel, all}] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [--package] [--long-name] [--dwarf] [workload workload...] |
The tool can collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.
Parameter Description
Parameter |
Option |
Description |
|---|---|---|
-h/--help |
- |
Obtains help information. This parameter is optional. |
-c/--cpu |
- |
Numbers of CPU cores to be collected, for example, 0, 0,1,2, and 0-2. This parameter is optional. |
-d/--duration |
- |
Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis. This parameter is optional. |
-P/--period |
- |
Interval of sampling the number of instructions, which defaults to 8092. The value ranges from 1,024 to 4,294,967,295. This parameter is optional. |
-D/--delay |
- |
Collection delay, which defaults to 0, in seconds, and must be less than the collection duration. This parameter is optional. |
-t/--top |
- |
Number of data records to be displayed in the report, which defaults to 10. The minimum value is 1. This parameter is optional. |
-l/--log-level |
0/1/2/3 |
Log level, which defaults to 1. This parameter is optional.
|
-m/--metric |
1/2/3/4 |
Level of data to be collected. This parameter is optional. which defaults to 1 (LLC Miss).
|
-L/--latency |
- |
Minimum delay (clock cycle), which defaults to 0. This parameter can be set when collecting Long Latency Load data. This parameter is optional. |
-i/--interval |
- |
Task collection interval, in seconds. This parameter is optional. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport. |
-r/--collection-range |
user/kernel/all |
Collection mode, which defaults to all. This parameter is optional.
|
-o/--output |
- |
Report package name and output path (no package name extension required). If you enter a name only, the report package is generated in the current directory by default. This option must be used together with --package. This parameter is optional. |
-s/--src-dir |
- |
C/C++ source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display. This parameter is optional. |
-p/--pid |
PID/PID1,PID2/ALL |
ID of a process to be collected. Separate multiple PIDs with commas (,). This parameter is optional. By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected. |
--package |
- |
Indicates whether to generate a report data package. If you do not set the package name or path, the miss-Timestamp.tar package is generated in the current directory by default. This parameter is optional. |
--long-name |
- |
Indicates whether to display detailed function and module information. This parameter is optional. If this parameter is not set, the module or function information is displayed in a simple manner by default. |
--dwarf |
- |
Indicates whether to display the associated source file. This parameter is optional. |
Example
- Collect system data.
1devkit tuner miss -c 0-127 -d 5 -o /home/miss_report -m 1 --package
The -c 0-127 parameter in this command collects CPU cores 0 to 127 with a collection duration of 5 seconds. The -o /home/miss_report and --package parameters generate a report data package named miss_report to a specified path. The -m 1 parameter collects LLC Miss events.
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Miss Summary Report-all Time:2024/05/22 17:56:33 ================================================================================ ────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── UNKNOWN /home/devkit/lib/libpython3.9.so.1.0 14,196,736 (11.53%) _PyEval_EvalFrameDefault /home/devkit/lib/libpython3.9.so.1.0 11,321,344 (9.20%) UNKNOWN /usr/bin/devkit/tuner/lib/libsym.so 4,702,208 (3.82%) _perf_ioctl [kernel] 4,587,520 (3.73%) UNKNOWN /usr/lib64/libc-2.28.so 4,046,848 (3.29%) std::pair<std::_Rb_tree_***const, elf::sym> const&) /usr/bin/devkit/tuner/lib/libsym.so 3,694,592 (3.00%) UNKNOWN /home/devkit/libsqlite3/libsqlite3.so.0.8.6 3,588,096 (2.91%) seq_put_hex_ll [kernel] 3,080,192 (2.50%) _nohz_idle_balance [kernel] 1,941,504 (1.58%) __audit_syscall_exit [kernel] 1,933,312 (1.57%) ────────────────────────────────────────────────────────────────── 5509 milliseconds time elapsed The report /home/miss_report.tar is generated successfully. To view summary report. you can run: devkit report -i /home/miss_report.tar To view detail report. you can import the report to the WebUI or IDE to view details.
- Collect application data.
1devkit tuner miss -d 5 --package /opt/testdemo/cache_miss
The preceding command collects /opt/testdemo/cache_miss data. The -d 5 parameter indicates a collection duration of 5 seconds. The --package parameter generates a report data package in the tool directory. By default, the package is named in the format of miss plus timestamp.
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Miss Summary Report-all Time:2024/06/11 11:16:15 ================================================================================ ────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── main /opt/testdemo/cache_miss 74,964,992 (60.74%) copy_page [kernel] 33,554,432 (27.19%) change_protection_range [kernel] 6,815,744 (5.52%) UNKNOWN [kernel] 3,784,704 (3.07%) handle_percpu_devid_irq [kernel] 2,490,368 (2.02%) propagate_protected_usage [kernel] 917,504 (0.74%) page_counter_charge [kernel] 720,896 (0.58%) queued_spin_lock_slowpath [kernel] 32,768 (0.03%) account_system_index_time [kernel] 24,576 (0.02%) trigger_load_balance [kernel] 24,576 (0.02%) ────────────────────────────────────────────────────────────────── 6222 milliseconds time elapsed If *** is displayed in Function or Module, use --long-name to show full name. The report /usr/bin/devkit/miss-20240611-111608.tar is generated successfully. To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111608.tar To view detail report. you can import the report to the WebUI or IDE to view details.
- Collect process IDs.
1devkit tuner miss -d 5 --package -p 414192
The -p 414192 parameter collects information about the process whose ID is 414192.
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Miss Summary Report-all Time:2024/06/11 11:18:28 ================================================================================ ────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── UNKNOWN /usr/lib64/libpthread-2.28.so 32,505,856 (99.42%) queued_spin_lock_slowpath [kernel] 57,344 (0.18%) available_idle_cpu [kernel] 16,384 (0.05%) cpu_load_update_active [kernel] 16,384 (0.05%) futex_wait [kernel] 16,384 (0.05%) get_futex_value_locked [kernel] 16,384 (0.05%) trigger_load_balance [kernel] 16,384 (0.05%) __list_del_entry_valid [kernel] 8,192 (0.03%) fun2 /opt/testdemo/pthread_mutex_long 8,192 (0.03%) futex_wake [kernel] 8,192 (0.03%) ────────────────────────────────────────────────────────────────── 5976 milliseconds time elapsed If *** is displayed in Function or Module, use --long-name to show full name. The report /usr/bin/devkit/miss-20240611-111822.tar is generated successfully. To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111822.tar To view detail report. you can import the report to the WebUI or IDE to view details.
- View the generated report.
1devkit report -i /usr/bin/devkit/miss-20240611-111822.tar
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── UNKNOWN /usr/lib64/libpthread-2.28.so 32,505,856 (99.42%) queued_spin_lock_slowpath [kernel] 57,344 (0.18%) available_idle_cpu [kernel] 16,384 (0.05%) cpu_load_update_active [kernel] 16,384 (0.05%) futex_wait [kernel] 16,384 (0.05%) get_futex_value_locked [kernel] 16,384 (0.05%) trigger_load_balance [kernel] 16,384 (0.05%) __list_del_entry_valid [kernel] 8,192 (0.03%) fun2 /opt/testdemo/pthread_mutex_long 8,192 (0.03%) futex_wake [kernel] 8,192 (0.03%) ────────────────────────────────────────────────────────────────── 5976 milliseconds time elapsed