Rate This Document
Findability
Accuracy
Completeness
Readability

Microarchitecture Analysis

Command Function

Obtains the running status of instructions on the CPU pipeline based on Arm performance monitor unit (PMU) events, helping quickly locate performance bottlenecks of the current application on the CPU. You can modify programs to make full use of the current hardware resources.

Syntax

devkit tuner topdown [-h | --help][-d <DURATION> | --duration=DURATION][-p <PID> | --pid=PID][-r <RANGE> |
--collection-range=RANGE][-c <CPU> | --cpu=CPU][-L <METRIC> | --profile-level=METRIC][-D <DELAY> | --delay=
DELAY][-i <INTERVAL> | --interval=INTERVAL] COMMAND [appDir][appArgs]

The devkit tuner topdown command can be used to collect metrics of a specified application by adding the application path and application parameters to the end of the command. If the -c/--cpu or -p/--pid parameter is specified, the specified metrics are preferentially collected.

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

None

Obtains help information.

-c/--cpu

core_num

Specifies the number of CPU cores to be collected. The value can be 0 or 0, 1, 2 or 0-2.

-d/--duration

Num

Specifies the collection duration, in seconds. If this parameter is not set, continuous collection is performed by default. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis.

-D/--delay

Num

Specifies the sampling delay. The default value is 0 if this parameter is not specified.

-i/--interval

Num

Specifies the sampling interval. If this parameter is not set, the default sampling interval is 1 second. If the collection duration -d is set, the sampling interval must be less than or equal to the set collection duration.

-l/--log-level

0,1,2,3

Configures the log level. The default value is 1 (info).

  • 0: debug
  • 1: info
  • 2: warning
  • 3: error

-L/--profile-level

1,2,3,4,5,6

Specifies the analysis metric. If this parameter is not set, the default value 1 is used.

  • 1: Back-End Bound, Bad Speculation, Front-End Bound, and Retiring are collected by default.
  • 2: The Back-End Bound->Core Bound collection is performed. Back-End is the processor portion that performs out-of-order dispatch and execution of micro-ops (uOps) and returns results. Core Bound is a subclass of Back-End Bound. It reflects the ratio of performance bottlenecks due to insufficient CPU execution unit resources.
  • 3: The Back-End Bound->Memory Bound collection is performed. Back-End is the processor portion that performs out-of-order dispatch and execution of uOps and returns results. Memory Bound is a subclass of Back-End Bound. It reflects pipeline stalls due to data read/write waiting.
  • 4: The Back-End Bound->Resource Bound collection is performed (only for Kunpeng 920 processors). Back-End is the processor portion that performs out-of-order dispatch and execution of uOps and returns results. Resource Bound is a subclass of Back-End Bound. It reflects pipeline stalls that occur when uOps are dispatched to an out-of-order execution scheduler due to insufficient resources.
  • 5: Bad Speculation is collected. It reflects pipeline resources waste due to incorrect instruction speculations.
  • 6: Front-End Bound is collected. It is a part of a processor where instructions are fetched and decoded into uOps for the back-end pipeline execution. This metric reflects the proportion of processor front-end resources that are under-utilized.

-o/--output

file

Sets the name of the report data file. If this parameter is not set, the topdown-xxxx-xxxx.tar file is generated in the current path by default. If a file with the same name exists, the file_name--xxxx-xxxx.tar file is generated.

-r/--collection-range

  • user
  • kernel
  • all

Sets the level of process collection. When -p/--pid is set to ALL, the option user or kernel can be selected, which means that user-mode processes or kernel-mode processes can be collected. If this parameter is not specified, processes of both modes are collected by default.

  • user: user mode
  • kernel: kernel mode
  • all: both modes

-p/--pid

  • PID
  • PID1, PID2
  • ALL

Specifies the PID of a process to be collected. You can separate multiple PIDs with commas (,). If both -p and -c are used, processes with specified PIDs are preferentially collected. If this parameter is not set, all processes are collected by default.

--package

None

Sets whether to import data to the database and generate compressed packages in the specified output path.

Example

  • Collection based on CPUs:
    devkit tuner topdown -c 0-127  -d 3 -o /home/topdown_cpu -L 2 --package

    The -c 0-127 parameter in this command collects CPU cores 0 to 127 with a collection duration of 3 seconds. The -o /home/topdown_cpu and --package parameters generate a report data package named topdown_cpu to a specified path. The -L 2 parameter collects the Back-End Bound->Core Bound instruction data.

    Command output:

       Topdown metrics of 'CPU(s) 0-127':
    
         Instructions                                 564372224
         Cycles                                       814631616
         IPC                                          0.69
         Backend Bound                                30.16%
         |-- Core Bound                               |-- 20.40%
             |-- Divider Stall                            |-- 0.00%
             |-- FSU Stall                                |-- 0.00%
             |-- Exe Ports Stall                          |-- 20.40%
         Bad Speculation                              12.29%
         Frontend Bound                               40.23%
         Retiring                                     17.32%
    
       3421 milliseconds time elapsed
    
    info: The report /home/topdowm.tar is generated successfully.
    info: To view summary report. you can run: devkit report -i /home/topdowm.tar
    info: To view detail report. you can import the report to the WebUI or IDE to view details.
  • Collection based on process IDs:
    devkit tuner topdown -p 1891015 -d 3 -o /home/topdown_pid --package

    In this command, -p 1891015 collects the process whose PID is 1891015 with a collection duration of 3 seconds. -o /home/topdown_pid and --package generate a report data package named topdown_pid to a specified path. If -L is not specified, the Back-End Bound, Bad Speculation, Front-End Bound, and Retiring instruction data are collected.

    Command output:

       Topdown metrics of process id '1891015':
    
         Instructions                                 0
         Cycles                                       0
         IPC                                          0.00
         Backend Bound                                0.00%
         Bad Speculation                              0.00%
         Frontend Bound                               0.00%
         Retiring                                     0.00%
    
       3362 milliseconds time elapsed
    
    info: The report /home/topdowm-pid.tar is generated successfully.
    info: To view summary report. you can run: devkit report -i /home/topdowm_pid.tar
    info: To view detail report. you can import the report to the WebUI or IDE to view details.
  • Collection based on applications:
    devkit tuner topdown -d 10 -o /home/topdown_app -L 2 --package /opt/testdemo/falsesharinglong

    The collection duration in this command is 10 seconds. The -o /home/topdown_app and --package parameters generate a report data package named topdown_app to a specified path. The -L 2 parameter collects the Back-End Bound->Core Bound instruction data.

    Command output:

       Topdown metrics of '/opt/testdemo/falsesharinglong':
    
         Instructions                                 59373846528
         Cycles                                       52865875968
         IPC                                          1.12
         Backend Bound                                67.58%
         |-- Core Bound                               |-- 63.31%
             |-- Divider Stall                            |-- 0.00%
             |-- FSU Stall                                |-- 0.00%
             |-- Exe Ports Stall                          |-- 63.31%
         Bad Speculation                              3.56%
         Frontend Bound                               0.78%
         Retiring                                     28.08%
    
       11153 milliseconds time elapsed
    
    info: The report /home/topdown_app.tar is generated successfully.
    info: To view summary report. you can run: devkit report -i /home/topdown_app.tar
    info: To view detail report. you can import the report to the WebUI or IDE to view details.

The command output is the overview about the microarchitecture analysis task. For the time sequence information, you can use the --package parameter to generate a TAR package and import the package to the WebUI for visualized information. For details, see contents about importing tasks in Task Management.