Rate This Document
Findability
Accuracy
Completeness
Readability

Viewing Process/Thread Performance Analysis Results

Prerequisites

A tuning analysis task has been created and completed.

Procedure

  1. In the Tuning Assistant area, click on the left of the target analysis task to expand the node list.
  2. Click the name of the target node to view the analysis results.
    Figure 1 Analysis result page
  3. (Optional) Select a service type and suggestion type.
    • Select a service types as required. The options are CPU-intensive, Network I/O-intensive, and Storage I/O-intensive. You can select one or more service types. By default, all the three options are selected.
      Figure 2 Selecting service types
    • Select a suggestion type based on the actual situation. You can adjust the topology tree by selecting Tuning path–based suggestions or Threshold-filtered suggestions.
      Figure 3 Selecting a suggestion scope
  4. Click Process/Thread Performance to view the process/thread performance analysis result and perform tuning accordingly.
    Figure 4 Process/Thread Performance page

    On the Process/Thread Performance analysis result page, you can perform the following operations:

    • Select a CPU metric from the CPU Metrics drop-down list. Table 1 describes the available CPU metrics.
      Table 1 CPU metrics

      CPU Metric

      Description

      %user

      Percentage of CPU time occupied when the system is running in user mode.

      %sys

      Percentage of CPU time occupied when the system is running in kernel mode. This metric does not include the time spent on service hardware and software interrupts.

      %cpu

      CPU usage in non-idle state.

    • Click to set the utilization range.
      In the displayed Utilization Range window, set the start value and end value of the range, and click OK.
      Figure 5 Setting the utilization range
    • Move the mouse pointer to a block to view the performance data of the corresponding CPU core. Click View related threads to view the threads related to the CPU core (the threads are represented by triangles). In the thread view, you can click PID in the upper left corner to return to the previous page.
      Figure 6 Viewing the performance data of a CPU core
      Figure 6 Viewing related threads
    • Click a block. In the details area on the right, view the detailed process and thread information about the CPU core. If there is an Nginx thread, the displayed information includes the microarchitecture metrics, memory access metrics, CPU affinity, memory affinity, operation functions, operation files, operation network ports, and system calls.
      Figure 8 Viewing the process/thread details of a CPU core
  5. View the topology of the process/thread performance tuning suggestions for the CPU core in the Busy state.
    • If the CPU metric is set to %user and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of process/thread performance tuning suggestions is displayed in the lower part of the page.
      Figure 9 Topology of process/thread performance tuning suggestions (%user)

      Table 2 describes the parameters in the Threshold Setting area.

      Table 2 Parameter description

      Parameter

      Description

      CPU Affinity

      A scheduling attribute that binds a process to one CPU or a group of CPUs.

      Memory Affinity

      A scheduling attribute that allocates memory to local NUMA nodes.

      Concurrency

      Threads in a process.

      branch miss rate (%)

      Rate of mispredicted CPU running instruction branches. The value is an integer ranging from 1 to 100.

      L1-dcache miss rate (%)

      L1 data cache miss rate. The value is an integer ranging from 1 to 100.

      L1-icache miss rate (%)

      L1 instruction cache miss rate. The value is an integer ranging from 1 to 100.

      dTLB cache miss rate (%)

      Data TLB miss rate. The value is an integer ranging from 1 to 100.

      iTLB cache miss rate (%)

      Instruction TLB miss rate. The value is an integer ranging from 1 to 100.

      fault/s

      Number of missing pages per second. The value can be any positive integer.

      Table 3 describes the main nodes in the topology of process/thread tuning suggestions.

      Table 3 Level-1 nodes in the topology of process/thread performance tuning suggestions

      Node

      Description

      Switching of a process and its threads between different NUMA nodes

      During the execution of a process, the process is switched between different NUMA nodes.

      Cross-die or cross-chip memory access of processes/threads

      The cross-chip or cross-die memory access rate of the CPU is high.

      Low concurrency

      The process has fewer than 6 threads.

      Branch: high branch miss rate

      A large number of instruction execution branches are mispredicted.

      Cache: high cache miss rate

      When the ALU needs to extract data from the memory, the ALU first searches for the data in the highest-level cache, and then searches for the data in the second-level cache. If data is found in the cache, it is a hit. Otherwise, it is a miss.

      TLB: high TLB miss rate

      If the virtual memory to be accessed is not in the TLB, it is called TLB miss.

      Memory: high page fault rate

      The page fault rate of the memory is high.

      JVM: high CPU usage by the JVM

      The CPU usage of the JVM is too high.

      Compilation: compilers and compilation options

      Compilers and compilation options.

      Use the memory access analysis function for analysis.

      Analyze the memory access capability.

      Hotspot function analysis: Analyze top hotspot functions.

      Analyze the hotspot functions of the system or process.

      For each problem listed in Table 3, you can click the node to view problem details and click the next-level node to view the tuning suggestions.

    • If the CPU metric is set to %sys and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of process/thread performance tuning suggestions is displayed in the lower part of the page.
      Figure 10 Topology of process/thread performance tuning suggestions (%sys)

      Table 4 describes the parameters in the Threshold Setting area.

      Table 4 Parameter description

      Parameter

      Description

      Application Context Switchover

      A context switch refers to the process or thread switch performed by the kernel (core of the OS) on the CPU.

      Total System CPU Time Called by Application System (ms)

      Two CPU context switches are performed during a system call (user mode - kernel mode - user mode).

      Process majflt/s

      Number of major page faults per second.

      Memory Usage (%)

      The memory usage exceeds 30%.

      Table 5 describes the main nodes in the topology of process/thread tuning suggestions.

      Table 5 Level-1 nodes in the topology of process/thread performance tuning suggestions

      Node

      Description

      Scheduling overhead: frequent context switches

      If there are frequent context switches, a large number of CPU resources are consumed and the performance deteriorates.

      SWAP: high majflt/s

      The number of major page faults per second is too high.

      Hotspot function analysis: Analyze top hotspot functions.

      Analyze the hotspot functions of the system or process.

      For each problem listed in Table 5, you can click the node to view problem details and click the next-level node to view the tuning suggestions.

    • If the CPU metric is set to %cpu and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of process/thread performance tuning suggestions is displayed in the lower part of the page.
      Figure 11 Topology of process/thread performance tuning suggestions (%cpu)

      Table 6 describes the parameters in the Threshold Setting area.

      Table 6 Parameter description

      Parameter

      Description

      System Block Function Calls from Application

      Number of times that an application calls system block functions per second.

      Network TX/RX Throughput

      Data volume that passes through a network (channel or interface) in a unit time.

      Application Read Size/s

      Data volume read from the drive per second.

      Application Write Size/s

      Data volume written by a process to a drive per second.

      Application I/O Latency

      Block I/O delay, including the time for waiting for the synchronization block I/O and the swap-in block I/O to end. The unit is clock cycle.

      rngd service

      The rngd service checks whether there are qualified random sources to generate random numbers and then feeds the random numbers to the entropy sink of the kernel.

      Table 7 describes the main nodes in the topology of process/thread tuning suggestions.

      Table 7 Level-1 nodes in the topology of process/thread performance tuning suggestions

      Node

      Description

      Network I/O: low network sending and receiving throughput and number of received and sent packets

      The network sending and receiving throughput is low and the number of received and sent packets are small.

      Task blocking: analyzing using the lock and wait analysis function

      Access the system performance analysis page and use lock and wait analysis to locate the performance bottlenecks.

      Storage I/O: large process read and write data volume or high latency

      The storage capacity of the read and write processes or the latency is high.

      The system uses the rngd service.

      The rngd service checks whether there are qualified random sources to generate random numbers and then feeds the random numbers to the entropy sink of the kernel.

      For each problem listed in Table 7, you can click the node to view problem details and click the next-level node to view the tuning suggestions.

  6. Perform tuning setting based on the tuning suggestion topology.
    1. View the tuning suggestion topology tree on the analysis result page. Click and to select the corresponding tuning suggestions based on the configuration conditions.
      Figure 12 Tuning suggestions
    2. View the Relevant Configurations, Indicator Description, Optimization Suggestion, and Optimization Guide on the right. Click to adopt the tuning suggestion, or to cancel the adoption of the tuning suggestion.
      Figure 13 Tuning suggestion page
    3. The adopted tuning suggestions are saved in the associated report. Click associated report in the lower right corner of the page to access the associated report page.

      All adopted tuning suggestions are displayed on the associated report page. You can click the task name to view the details. Click Valid or Invalid in the lower left corner to report whether the tuning suggestion meets the expectation.

      Figure 14 Associated report
  7. Click on the right of process/thread performance analysis result tab page to view detailed process/thread performance data.
    Figure 15 Viewing process/thread performance data