我要评分
获取效率
正确性
完整性
易理解

Miss Event Analysis

Command Function

Uses the Statistical Profiling Extension (SPE) capability to analyze miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load. You can modify your program to reduce the probability of miss events and improve the program processing performance.

  • To collect miss events, the server must support Arm SPE collection. For details about how to configure SPE, see Configuring the SPE Environment.
  • Miss event analysis is available on openEuler 20.x or later and openEuler-based OS releases. VM or container environments are not supported.

Syntax

1
devkit tuner miss [-h] [-c {n | n,m | n-m}] [-d <sec>] [-P n] [-D <sec>] [-l {0, 1, 2, 3}] [-m {1, 2, 3, 4}] [-L n] [-i <sec>] [-r {user, kernel, all}] [-o] [-s] [-p {PID1 | PID1,PID2 | ALL}] [--package] [--long-name] [--dwarf] [workload workload...]

The tool can collect data of a specified application. Replace [workload workload...] in the command with the application path and application parameter.

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

-

Obtains help information.

-c/--cpu

-

Number of CPU cores to be collected. The value can be 0 or 0, 1, 2 or 0-2.

-d/--duration

-

Collection duration, in seconds. The minimum value is 1 second. By default collection never ends. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis.

-P/--period

-

Interval of sampling the number of instructions, which defaults to 8092. The value ranges from 1024 to 4,294,967,295.

-D/--delay

-

Collection delay, which defaults to 0 seconds and must be less than the collection duration.

-i/--interval

-

Collection interval, in seconds. The minimum value is 1 second and the maximum value cannot exceed the collection duration. The default value is the collection duration. If this parameter is not set, no subreports are generated. It specifies the time taken to collect data in each subreport.

-l/--log-level

0/1/2/3

Log level, which defaults to 1.
  • 0: DEBUG
  • 1: INFO
  • 2: WARNING
  • 3: ERROR

-m/--metric

1/2/3/4

Data collection level, which defaults to 1 (LLC Miss).
  • 1 (LLC Miss): Number of memory request misses in the LLC.
  • 2 (TLB Miss): Number of CPUs' memory access or addressing operations where no virtual-to-physical mapping is found in the TLB.
  • 3 (Remote Access): Number of cross-CPU DRAM accesses.
  • 4 (Long Latency Load): Ratio of cross-CPU DRAM accesses where the access latency exceeds the preset minimum latency.

-L/--latency

-

Minimum delay (clock cycle), which defaults to 0. This parameter can be set when collecting Long Latency Load data.

-r/--collection-range

user/kernel/all

Collection mode, which defaults to all.

  • all: collects user-mode and kernel-mode performance data.
  • user: collects user-mode performance data.
  • kernel: collects kernel-mode performance data.

-o/--output

-

Report package name and output path. If you enter a name only, the report package is generated in the current directory by default. This option must be used together with --package.

-s/--src-dir

-

C/C++ source code working directory, which is used to search for and associate source code. You can import a task to the web client to facilitate the display.

-p/--pid

PID/PID1, PID2/ALL

ID of a process to be collected. Separate multiple PIDs with commas (,). By default, all processes are collected. If both the -p and -c parameters are used, the processes with the specified PIDs are preferentially collected.

--package

-

Indicates whether to generate a report data package. If you do not set the package name or path, the miss-timestamp.tar package is generated in the current directory by default.

--long-name

-

Indicates whether to display detailed function and module information. If this parameter is not set, the module or function information is displayed in a simple manner by default.

-t/--top

-

Number of data records to be displayed in the report, which defaults to 10. The minimum value is 1.

--dwarf

-

Indicates whether to generate C/C++ source code or assembly code files.

Example

  • Collecting system data:
    1
    devkit tuner miss -c 0-127 -d 5 -o /home/miss_report -m 1 --package
    

    The -c 0-127 parameter in this command collects CPU cores 0 to 127 with a collection duration of 5 seconds. The -o /home/miss_report and --package parameters generate a report data package named miss_report to a specified path. The -m 1 parameter collects LLC Miss events.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    Miss Summary Report-all                                 Time:2024/05/22 17:56:33
    ================================================================================
    
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      UNKNOWN                                             /home/devkit/lib/libpython3.9.so.1.0                        14,196,736 (11.53%)
      _PyEval_EvalFrameDefault                            /home/devkit/lib/libpython3.9.so.1.0                        11,321,344 (9.20%)
      UNKNOWN                                             /usr/bin/devkit/tuner/lib/libsym.so                          4,702,208 (3.82%)
      _perf_ioctl                                         [kernel]                                                     4,587,520 (3.73%)
      UNKNOWN                                             /usr/lib64/libc-2.28.so                                      4,046,848 (3.29%)
      std::pair<std::_Rb_tree_***const, elf::sym> const&) /usr/bin/devkit/tuner/lib/libsym.so                          3,694,592 (3.00%)
      UNKNOWN                                             /home/devkit/libsqlite3/libsqlite3.so.0.8.6                  3,588,096 (2.91%)
      seq_put_hex_ll                                      [kernel]                                                     3,080,192 (2.50%)
      _nohz_idle_balance                                  [kernel]                                                     1,941,504 (1.58%)
      __audit_syscall_exit                                [kernel]                                                     1,933,312 (1.57%)
    ──────────────────────────────────────────────────────────────────
    5509 milliseconds time elapsed
    
    The report /home/miss_report.tar is generated successfully.
    To view summary report. you can run: devkit report -i /home/miss_report.tar
    To view detail report. you can import the report to the WebUI or IDE to view details.
    
  • Collecting application data:
    1
    devkit tuner miss -d 5 --package /opt/testdemo/cache_miss
    

    The preceding command collects /opt/testdemo/cache_miss data. The -d 5 parameter indicates a collection duration of 5 seconds. The --package parameter generates a report data package in the tool directory. By default, the package is named in the format of miss plus timestamp.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    Miss Summary Report-all                                 Time:2024/06/11 11:16:15
    ================================================================================
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      main                                               /opt/testdemo/cache_miss                                    74,964,992 (60.74%)
      copy_page                                          [kernel]                                                    33,554,432 (27.19%)
      change_protection_range                            [kernel]                                                     6,815,744 (5.52%)
      UNKNOWN                                            [kernel]                                                     3,784,704 (3.07%)
      handle_percpu_devid_irq                            [kernel]                                                     2,490,368 (2.02%)
      propagate_protected_usage                          [kernel]                                                       917,504 (0.74%)
      page_counter_charge                                [kernel]                                                       720,896 (0.58%)
      queued_spin_lock_slowpath                          [kernel]                                                        32,768 (0.03%)
      account_system_index_time                          [kernel]                                                        24,576 (0.02%)
      trigger_load_balance                               [kernel]                                                        24,576 (0.02%)
    ──────────────────────────────────────────────────────────────────
    6222 milliseconds time elapsed
    If *** is displayed in Function or Module, use --long-name to show full name.
    The report /usr/bin/devkit/miss-20240611-111608.tar is generated successfully.
    To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111608.tar
    To view detail report. you can import the report to the WebUI or IDE to view details.
    
  • Collecting based on PIDs:
    1
    devkit tuner miss -d 5 --package -p 414192
    

    The -p 414192 parameter collects information about the process whose ID is 414192.

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    Miss Summary Report-all                                 Time:2024/06/11 11:18:28
    ================================================================================
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      UNKNOWN                                            /usr/lib64/libpthread-2.28.so                               32,505,856 (99.42%)
      queued_spin_lock_slowpath                          [kernel]                                                        57,344 (0.18%)
      available_idle_cpu                                 [kernel]                                                        16,384 (0.05%)
      cpu_load_update_active                             [kernel]                                                        16,384 (0.05%)
      futex_wait                                         [kernel]                                                        16,384 (0.05%)
      get_futex_value_locked                             [kernel]                                                        16,384 (0.05%)
      trigger_load_balance                               [kernel]                                                        16,384 (0.05%)
      __list_del_entry_valid                             [kernel]                                                         8,192 (0.03%)
      fun2                                               /opt/testdemo/pthread_mutex_long                                 8,192 (0.03%)
      futex_wake                                         [kernel]                                                         8,192 (0.03%)
    ──────────────────────────────────────────────────────────────────
    5976 milliseconds time elapsed
    If *** is displayed in Function or Module, use --long-name to show full name.
    The report /usr/bin/devkit/miss-20240611-111822.tar is generated successfully.
    To view summary report. you can run: devkit report -i /usr/bin/devkit/miss-20240611-111822.tar
    To view detail report. you can import the report to the WebUI or IDE to view details.
    
  • Viewing the report:
    1
    devkit report -i /usr/bin/devkit/miss-20240611-111822.tar
    

    Command output:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    ──────────────────────────────────────────────────────────────────
      Function                                           Module                                                                LLC Miss
    ──────────────────────────────────────────────────────────────────
      UNKNOWN                                            /usr/lib64/libpthread-2.28.so                               32,505,856 (99.42%)
      queued_spin_lock_slowpath                          [kernel]                                                        57,344 (0.18%)
      available_idle_cpu                                 [kernel]                                                        16,384 (0.05%)
      cpu_load_update_active                             [kernel]                                                        16,384 (0.05%)
      futex_wait                                         [kernel]                                                        16,384 (0.05%)
      get_futex_value_locked                             [kernel]                                                        16,384 (0.05%)
      trigger_load_balance                               [kernel]                                                        16,384 (0.05%)
      __list_del_entry_valid                             [kernel]                                                         8,192 (0.03%)
      fun2                                               /opt/testdemo/pthread_mutex_long                                 8,192 (0.03%)
      futex_wake                                         [kernel]                                                         8,192 (0.03%)
    ──────────────────────────────────────────────────────────────────
    5976 milliseconds time elapsed