Rate This Document
Findability
Accuracy
Completeness
Readability

Miss Event Analysis

When accessing data, a CPU searches for the cache level by level. If the target data is not in the cache, a cache miss occurs (the performance deteriorates severely when many cache misses occur). The miss event analysis function analyzes miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load, helping you modify your program properly to improve the program performance.

Command Function

Uses the Statistical Profiling Extension (SPE) capability to analyze miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load. You can modify your program to reduce the probability of miss events and improve the program processing performance.

  • To collect miss events, the server must support Arm SPE collection. For details about how to configure SPE, see Configuring the SPE Environment.
  • Miss event analysis is available on openEuler 20.x or later and openEuler-based OS releases. VM or container environments are not supported.

Syntax

1
devkit tuner miss [-h] [-c {n | n,m | n-m}] [-d <sec>] [-P n] [-D <sec>] [-t n] [-l {0, 1, 2, 3}] [-m {1, 2, 3, 4}] [-L n] [-i <sec>] [-r {user, kernel, all}] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [--package] [--long-name] [--dwarf] [workload workload...]

The tool can collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

-

Obtains help information. This parameter is optional.

-c/--cpu

-

Numbers of CPU cores to be collected, for example, 0, 0,1,2, and 0-2. This parameter is optional.

-d/--duration

-

Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis. This parameter is optional.

-P/--period

-

Interval of sampling the number of instructions, which defaults to 8092. The value ranges from 1,024 to 4,294,967,295. This parameter is optional.

-D/--delay

-

Collection delay, which defaults to 0, in seconds, and must be less than the collection duration. This parameter is optional.

-t/--top

-

Number of data records to be displayed in the report, which defaults to 10. The minimum value is 1. This parameter is optional.

-l/--log-level

0/1/2/3

Log level, which defaults to 1. This parameter is optional.
  • 0: DEBUG
  • 1: INFO
  • 2: WARNING
  • 3: ERROR

-m/--metric

1/2/3/4

Level of data to be collected. This parameter is optional. which defaults to 1 (LLC Miss).
  • 1 (LLC Miss): Number of memory request misses in the LLC.
  • 2 (TLB Miss): Number of CPUs' memory access or addressing operations where no virtual-to-physical mapping is found in the TLB.
  • 3 (Remote Access): Number of cross-CPU DRAM accesses.
  • 4 (Long Latency Load): Ratio of cross-CPU DRAM accesses where the access latency exceeds the preset minimum latency.

-L/--latency

-

Minimum delay (clock cycle), which defaults to 0. This parameter can be set when collecting Long Latency Load data. This parameter is optional.

-i/--interval

-

Task collection interval, in seconds. This parameter is optional. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport.

-r/--collection-range

user/kernel/all

Collection mode, which defaults to all. This parameter is optional.

  • all: collects user-mode and kernel-mode performance data.
  • user: collects user-mode performance data.
  • kernel: collects kernel-mode performance data.

-o/--output

-

Report package name and output path (no package name extension required). If you enter a name only, the report package is generated in the current directory by default. This option must be used together with --package. This parameter is optional.

-s/--src-dir

-

C/C++ source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display. This parameter is optional.

-p/--pid

PID/PID1,PID2/ALL

ID of a process to be collected. Separate multiple PIDs with commas (,). This parameter is optional. By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected.

--package

-

Indicates whether to generate a report data package. If you do not set the package name or path, the miss-Timestamp.tar package is generated in the current directory by default. This parameter is optional.

--long-name

-

Indicates whether to display detailed function and module information. This parameter is optional. If this parameter is not set, the module or function information is displayed in a simple manner by default.

--dwarf

-

Indicates whether to display the associated source file. This parameter is optional.

Example

  • Collect system data.
    1
    devkit tuner miss -c 0-127 -d 5 -o /home/miss_report -m 1 --package
    

    The -c 0-127 parameter in this command collects CPU cores 0 to 127 with a collection duration of 5 seconds. The -o /home/miss_report and --package parameters generate a report data package named miss_report to a specified path. The -m 1 parameter collects LLC Miss events.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    Miss Summary Report-all                                 Time:2024/05/22 17:56:33
    ================================================================================
    
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      UNKNOWN                                             /home/devkit/lib/libpython3.9.so.1.0                        14,196,736 (11.53%)
      _PyEval_EvalFrameDefault                            /home/devkit/lib/libpython3.9.so.1.0                        11,321,344 (9.20%)
      UNKNOWN                                             /usr/bin/devkit/tuner/lib/libsym.so                          4,702,208 (3.82%)
      _perf_ioctl                                         [kernel]                                                     4,587,520 (3.73%)
      UNKNOWN                                             /usr/lib64/libc-2.28.so                                      4,046,848 (3.29%)
      std::pair<std::_Rb_tree_***const, elf::sym> const&) /usr/bin/devkit/tuner/lib/libsym.so                          3,694,592 (3.00%)
      UNKNOWN                                             /home/devkit/libsqlite3/libsqlite3.so.0.8.6                  3,588,096 (2.91%)
      seq_put_hex_ll                                      [kernel]                                                     3,080,192 (2.50%)
      _nohz_idle_balance                                  [kernel]                                                     1,941,504 (1.58%)
      __audit_syscall_exit                                [kernel]                                                     1,933,312 (1.57%)
    ──────────────────────────────────────────────────────────────────
    5509 milliseconds time elapsed
    
    The report /home/miss_report.tar is generated successfully.
    To view summary report. you can run: devkit report -i /home/miss_report.tar
    To view detail report. you can import the report to the WebUI or IDE to view details.
    
  • Collect application data.
    1
    devkit tuner miss -d 5 --package /opt/testdemo/cache_miss
    

    The preceding command collects /opt/testdemo/cache_miss data. The -d 5 parameter indicates a collection duration of 5 seconds. The --package parameter generates a report data package in the tool directory. By default, the package is named in the format of miss plus timestamp.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    Miss Summary Report-all                                 Time:2024/06/11 11:16:15
    ================================================================================
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      main                                               /opt/testdemo/cache_miss                                    74,964,992 (60.74%)
      copy_page                                          [kernel]                                                    33,554,432 (27.19%)
      change_protection_range                            [kernel]                                                     6,815,744 (5.52%)
      UNKNOWN                                            [kernel]                                                     3,784,704 (3.07%)
      handle_percpu_devid_irq                            [kernel]                                                     2,490,368 (2.02%)
      propagate_protected_usage                          [kernel]                                                       917,504 (0.74%)
      page_counter_charge                                [kernel]                                                       720,896 (0.58%)
      queued_spin_lock_slowpath                          [kernel]                                                        32,768 (0.03%)
      account_system_index_time                          [kernel]                                                        24,576 (0.02%)
      trigger_load_balance                               [kernel]                                                        24,576 (0.02%)
    ──────────────────────────────────────────────────────────────────
    6222 milliseconds time elapsed
    If *** is displayed in Function or Module, use --long-name to show full name.
    The report /usr/bin/devkit/miss-20240611-111608.tar is generated successfully.
    To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111608.tar
    To view detail report. you can import the report to the WebUI or IDE to view details.
    
  • Collect process IDs.
    1
    devkit tuner miss -d 5 --package -p 414192
    

    The -p 414192 parameter collects information about the process whose ID is 414192.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    Miss Summary Report-all                                 Time:2024/06/11 11:18:28
    ================================================================================
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      UNKNOWN                                            /usr/lib64/libpthread-2.28.so                               32,505,856 (99.42%)
      queued_spin_lock_slowpath                          [kernel]                                                        57,344 (0.18%)
      available_idle_cpu                                 [kernel]                                                        16,384 (0.05%)
      cpu_load_update_active                             [kernel]                                                        16,384 (0.05%)
      futex_wait                                         [kernel]                                                        16,384 (0.05%)
      get_futex_value_locked                             [kernel]                                                        16,384 (0.05%)
      trigger_load_balance                               [kernel]                                                        16,384 (0.05%)
      __list_del_entry_valid                             [kernel]                                                         8,192 (0.03%)
      fun2                                               /opt/testdemo/pthread_mutex_long                                 8,192 (0.03%)
      futex_wake                                         [kernel]                                                         8,192 (0.03%)
    ──────────────────────────────────────────────────────────────────
    5976 milliseconds time elapsed
    If *** is displayed in Function or Module, use --long-name to show full name.
    The report /usr/bin/devkit/miss-20240611-111822.tar is generated successfully.
    To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111822.tar
    To view detail report. you can import the report to the WebUI or IDE to view details.
    
  • View the generated report.
    1
    devkit report -i /usr/bin/devkit/miss-20240611-111822.tar
    

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      UNKNOWN                                            /usr/lib64/libpthread-2.28.so                               32,505,856 (99.42%)
      queued_spin_lock_slowpath                          [kernel]                                                        57,344 (0.18%)
      available_idle_cpu                                 [kernel]                                                        16,384 (0.05%)
      cpu_load_update_active                             [kernel]                                                        16,384 (0.05%)
      futex_wait                                         [kernel]                                                        16,384 (0.05%)
      get_futex_value_locked                             [kernel]                                                        16,384 (0.05%)
      trigger_load_balance                               [kernel]                                                        16,384 (0.05%)
      __list_del_entry_valid                             [kernel]                                                         8,192 (0.03%)
      fun2                                               /opt/testdemo/pthread_mutex_long                                 8,192 (0.03%)
      futex_wake                                         [kernel]                                                         8,192 (0.03%)
    ──────────────────────────────────────────────────────────────────
    5976 milliseconds time elapsed