Miss Event Analysis
Command Function
Uses the Statistical Profiling Extension (SPE) capability to analyze miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load. You can modify your program to reduce the probability of miss events and improve the program processing performance.
Syntax
devkit tuner miss [-h] [-c {n | n,m | n-m}] [-d <sec>] [-P n] [-D <sec>] [-l {0, 1, 2, 3}] [-m {1, 2, 3, 4}] [-L n] [-i <sec>] [-r {user, kernel, all}] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [--package] [--long-name] [--dwarf] [workload workload...]
[workload workload...] can be used to collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.
Parameter Description
Parameter |
Option |
Description |
|---|---|---|
-h/--help |
- |
Obtains help information. |
-c/--cpu |
- |
Number of CPU cores to be collected. The value can be 0 or 0, 1, 2 or 0-2. |
-d/--duration |
- |
Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis. |
-P/--period |
- |
Interval of sampling the number of instructions, which defaults to 8092. The value ranges from 1024 to 4,294,967,295. |
-D/--delay |
- |
Collection delay, which defaults to 0 seconds and must be less than the collection duration. |
-i/--interval |
- |
Collection interval, in seconds. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport. |
-l/--log-level |
0/1/2/3 |
Log level, which defaults to 1.
|
-m/--metric |
1/2/3/4 |
Data collection level, which defaults to 1 (LLC Miss).
|
-L/--latency |
- |
Minimum delay (clock cycle), which defaults to 0. This parameter can be set when collecting Long Latency Load data. |
-r/--collection-rang |
user/kernel/all |
Collection mode, which defaults to all.
|
-o/--output |
- |
Report file name. Reports are generated in the current directory by default. |
-s/--src-dir |
- |
C/C++ source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display. |
-p/--pid |
PID/PID1, PID2/ALL |
ID of a process to be collected. Separate multiple PIDs with commas (,). By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected. |
--package |
- |
Indicates whether to import data to the database and generate compressed packages in the specified output path. |
--long-name |
- |
Indicates whether to display detailed function and module information. If this parameter is not set, the module or function information is displayed in a simple manner by default. |
-t/--top |
- |
Number of data records to be displayed in the report, which defaults to 10. The minimum value is 1. |
--dwarf |
- |
Indicates whether to generate C/C++ source code or assembly code files. |
Example
- Collecting system data:
devkit tuner miss -c 0-127 -d 5 -o /home/miss_report -m 1 --package
The -c 0-127 parameter in this command collects CPU cores 0 to 127 with a collection duration of 5 seconds. The -o /home/miss_report and --package parameters generate a report data package named miss_report to a specified path. The -m 1 parameter collects LLC Miss events.
Command output:
Miss Summary Report-all Time:2024/05/22 17:56:33 ================================================================================ ────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── UNKNOWN /home/devkit/lib/libpython3.9.so.1.0 14,196,736 (11.53%) _PyEval_EvalFrameDefault /home/devkit/lib/libpython3.9.so.1.0 11,321,344 (9.20%) UNKNOWN /usr/bin/devkit/tuner/lib/libsym.so 4,702,208 (3.82%) _perf_ioctl [kernel] 4,587,520 (3.73%) UNKNOWN /usr/lib64/libc-2.28.so 4,046,848 (3.29%) std::pair<std::_Rb_tree_***const, elf::sym> const&) /usr/bin/devkit/tuner/lib/libsym.so 3,694,592 (3.00%) UNKNOWN /home/devkit/libsqlite3/libsqlite3.so.0.8.6 3,588,096 (2.91%) seq_put_hex_ll [kernel] 3,080,192 (2.50%) _nohz_idle_balance [kernel] 1,941,504 (1.58%) __audit_syscall_exit [kernel] 1,933,312 (1.57%) ────────────────────────────────────────────────────────────────── 5509 milliseconds time elapsed The report /home/miss_report.tar is generated successfully. To view summary report. you can run: devkit report -i /home/miss_report.tar To view detail report. you can import the report to the WebUI or IDE to view details.
- Collecting application data:
devkit tuner miss -d 5 --package /opt/testdemo/cache_miss
The preceding command collects /opt/testdemo/cache_miss data. The -d 5 parameter indicates a collection duration of 5 seconds. The --package parameter generates a report data package in the tool directory. By default, the package is named in the format of miss plus timestamp.
Command output:
Miss Summary Report-all Time:2024/06/11 11:16:15 ================================================================================ ────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── main /opt/testdemo/cache_miss 74,964,992 (60.74%) copy_page [kernel] 33,554,432 (27.19%) change_protection_range [kernel] 6,815,744 (5.52%) UNKNOWN [kernel] 3,784,704 (3.07%) handle_percpu_devid_irq [kernel] 2,490,368 (2.02%) propagate_protected_usage [kernel] 917,504 (0.74%) page_counter_charge [kernel] 720,896 (0.58%) queued_spin_lock_slowpath [kernel] 32,768 (0.03%) account_system_index_time [kernel] 24,576 (0.02%) trigger_load_balance [kernel] 24,576 (0.02%) ────────────────────────────────────────────────────────────────── 6222 milliseconds time elapsed If *** is displayed in Function or Module, use --long-name to show full name. The report /usr/bin/devkit/miss-20240611-111608.tar is generated successfully. To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111608.tar To view detail report. you can import the report to the WebUI or IDE to view details.
- Collecting based on PIDs:
devkit tuner miss -d 5 --package -p 414192
The -p 414192 parameter collects information about the process whose ID is 414192.
Command output:
Miss Summary Report-all Time:2024/06/11 11:18:28 ================================================================================ ────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── UNKNOWN /usr/lib64/libpthread-2.28.so 32,505,856 (99.42%) queued_spin_lock_slowpath [kernel] 57,344 (0.18%) available_idle_cpu [kernel] 16,384 (0.05%) cpu_load_update_active [kernel] 16,384 (0.05%) futex_wait [kernel] 16,384 (0.05%) get_futex_value_locked [kernel] 16,384 (0.05%) trigger_load_balance [kernel] 16,384 (0.05%) __list_del_entry_valid [kernel] 8,192 (0.03%) fun2 /opt/testdemo/pthread_mutex_long 8,192 (0.03%) futex_wake [kernel] 8,192 (0.03%) ────────────────────────────────────────────────────────────────── 5976 milliseconds time elapsed If *** is displayed in Function or Module, use --long-name to show full name. The report /usr/bin/devkit/miss-20240611-111822.tar is generated successfully. To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111822.tar To view detail report. you can import the report to the WebUI or IDE to view details.
- Viewing the report:
devkit report -i /usr/bin/devkit/miss-20240611-111822.tar
Command output:
────────────────────────────────────────────────────────────────── Function Module LLC Miss ────────────────────────────────────────────────────────────────── UNKNOWN /usr/lib64/libpthread-2.28.so 32,505,856 (99.42%) queued_spin_lock_slowpath [kernel] 57,344 (0.18%) available_idle_cpu [kernel] 16,384 (0.05%) cpu_load_update_active [kernel] 16,384 (0.05%) futex_wait [kernel] 16,384 (0.05%) get_futex_value_locked [kernel] 16,384 (0.05%) trigger_load_balance [kernel] 16,384 (0.05%) __list_del_entry_valid [kernel] 8,192 (0.03%) fun2 /opt/testdemo/pthread_mutex_long 8,192 (0.03%) futex_wake [kernel] 8,192 (0.03%) ────────────────────────────────────────────────────────────────── 5976 milliseconds time elapsed