Viewing System Performance Analysis Results
Prerequisites
A tuning analysis task has been created and completed.
Procedure
- In the Tuning Assistant area, click
on the left of the target analysis task to expand the node list. - Click the name of the target node to view the analysis results.Figure 1 Analysis result page
- (Optional) Select a service type and suggestion type.
- Select service types as required. The options are CPU-intensive, Network I/O-intensive, and Storage I/O-intensive. You can select one or more service types. By default, all the three options are selected.
Figure 2 Selecting service types
- Select a suggestion type based on the actual situation. You can adjust the topology tree by selecting Tuning path–based suggestions or Threshold-filtered suggestions.
Figure 3 Selecting a suggestion type
- Select service types as required. The options are CPU-intensive, Network I/O-intensive, and Storage I/O-intensive. You can select one or more service types. By default, all the three options are selected.
- Click System Performance to view the corresponding analysis result and perform tuning accordingly.Figure 4 System performance analysis results
On the System Performance tab page, you can perform the following operations:
- Select a CPU metric from the CPU Metrics drop-down list. Table 1 describes the available CPU metrics.
Table 1 CPU metrics CPU Metric
Description
%sys
Percentage of CPU time occupied when the system is running in kernel mode. This metric does not include the time spent on service hardware and software interrupts.
%user
Percentage of CPU time occupied when the system is running in user mode.
%iowait
Percentage of CPU time during which the CPU is idle and waiting for storage I/O operations.
%irq
Percentage of CPU time spent on service hardware interrupts.
%soft
Percentage of CPU time spent on service software interrupts.
%idle
Percentage of CPU time during which the CPU is idle and the system has no unfinished storage I/O request.
- Click
to set the utilization range.
In the displayed Utilization Range window, set the start value and end value of the range, and click OK.
Figure 5 Setting the utilization range
- Click
to switch the view.
The Tuning Assistant provides the following two types of system performance views:
- Default view
On the default view of the System Performance analysis result tab page, CPU cores are divided into the Idle, Normal, and Busy areas based on the CPU usage. You can click an area to zoom in on the area. The upper part of each area displays the CPU usage range. The middle part displays the circles indicating the CPU core usage in the area. The lower part is the distribution of the CPU cores in each NUMA node.
Figure 6 Default view
- NUMA view
In the NUMA view, the CPU cores are divided into four areas: NUMA 0, NUMA 1, NUMA 2, and NUMA 3 based on the bound NUMA nodes. The upper part of each area displays the name of the bound NUMA node and the ratio of the number of CPU cores bound to the NUMA node to the total number of CPU cores.
Figure 7 NUMA view
- Default view
- Move the mouse pointer over a circle to view the NUMA and core ID bound to the CPU core and performance data.
Figure 8 Viewing the system performance data of a CPU core
- Click a circle to view details about the CPU core on the right, including information about the processes and threads running on the CPU core, hardware interrupts, and software interrupts.
Figure 9 Viewing CPU core details
- Click
to expand the details area. - Click
to collapse the CPU Metrics area. By default, this area is displayed.
- Click
- Select a CPU metric from the CPU Metrics drop-down list. Table 1 describes the available CPU metrics.
- View the topology of the system performance tuning suggestions for the CPU core in the Busy state.
- If the CPU metric is set to %sys and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
Figure 10 Topology of system performance tuning suggestions (%sys)
Table 2 describes the parameters in the Threshold Setting area.
Table 2 Parameter description Parameter
Description
High Ratio of Network IOPS to Total Bandwidth (%)
IOPS indicates I/O requests the system can process per unit of time, usually, per second. I/O requests typically mean data read or write requests. If Network IOPS/Total bandwidth x 100% ≥ Threshold, a tuning suggestion for this parameter is provided.
Received/Transmitted Packets Lost per Second
Number of received/transmitted packets that are discarded per second because the buffer is full.
Low Ratio of Network IOPS to Total Bandwidth (%)
IOPS indicates I/O requests the system can process per unit of time, usually, per second. I/O requests typically mean data read or write requests. If Network IOPS/Total bandwidth x 100% < Threshold, a tuning suggestion for this parameter is provided.
Pages Swapped per Second
Total number of pages swapped in or swapped out per second. The corresponding metrics are pswin/s and pswpout/s.
cswch/s
Number of context switches of active tasks per second.
majflt/s
Number of major page faults per second. When a virtual memory address is mapped to a physical memory address, the corresponding page is in the swap partition. Such page faults are major page faults, which are generated when the memory is insufficient. Pages need to be loaded from the hard drive.
pswpin/s
Total number of partition pages swapped in per second.
pswpout/s
Total number of partition pages swapped out per second.
System Memory Usage (%)
Memory usage of the system.
Table 3 describes the main nodes in the tuning suggestion topology.
Table 3 Node description Faulty Node
Description
Network I/O: high network IOPS or traffic
IOPS indicates I/O requests the system can process per unit of time, usually, per second. I/O requests typically mean data read or write requests.
Network I/O: low network IOPS or traffic
IOPS indicates I/O requests the system can process per unit of time, usually, per second. I/O requests typically mean data read or write requests.
Scheduling overhead: frequent context switches
Context switch is to save the CPU context (CPU register and program counter) of the previous task, load the context of the new task to these registers and program counters, and then jump to the new position specified by the program counter to run the new task.
The saved context is stored in the system kernel and loaded again when the task is rescheduled and executed. In this way, the original status of the task is not affected, and the task seems to be running continuously.
NOTICE:Frequent context switches may deteriorate system performance.
SWAP: high majflt/s. pswpin/s and pswpout/s exist.
The number of major page faults generated per second is high, and there are major page faults that are swapped in and out in the system.
Process analysis: Analyze the top %system processes.
Percentage of CPU idle time.
For each problem listed in Table 3, you can click the node to view problem details and click the next-level node to view the tuning suggestions.
- If the CPU metric is set to %user and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
Figure 11 Topology of system performance tuning suggestions (%user)
Table 4 describes the parameters in the Threshold Setting area.
Table 4 Parameter description Parameter
Description
branch miss rate (%)
Rate of mispredicted CPU running instruction branches. The value is an integer ranging from 1 to 100.
dTLB cache miss rate (%)
Data translation lookaside buffer (TLB) miss rate. The value is an integer ranging from 1 to 100.
iTLB cache miss rate (%)
Instruction TLB miss rate. The value is an integer ranging from 1 to 100.
L1-dcache miss rate (%)
L1 data cache miss rate. The value is an integer ranging from 1 to 100.
L1-icache miss rate (%)
L1 instruction cache miss rate. The value is an integer ranging from 1 to 100.
fault/s
Number of missing pages per second. The value can be any positive integer.
Cross-chip or cross-die access rate (%)
The cross-chip or cross-die memory access rate of the CPU. The value is an integer ranging from 1 to 100.
Die Memory Usage (%)
Ratio of the actual die memory usage to the maximum die memory usage. The value is an integer ranging from 1 to 100.
Table 5 describes the main nodes in the tuning suggestion topology.
Table 5 Node description Faulty Node
Description
High branch miss rate
A large number of instruction execution branches are mispredicted.
High TLB miss rate
If the virtual memory to be accessed is not in the TLB, it is called TLB miss.
High cache miss rate
When the arithmetic logic unit (ALU) needs to extract data from the memory, the ALU first searches for the data in the highest-level cache, and then searches for the data in the second-level cache. If data is found in the cache, it is a hit. Otherwise, it is a miss.
High page fault rate
A large number of pages are missing in the system.
High cross-chip or cross-die access rate
The cross-chip or cross-die memory access rate of the CPU is high.
Unbalanced memory usage between dies
The memory usage between dies is unbalanced.
Memory access analysis function used
Switch to the System Profiler home page to perform a memory access analysis.
Analyze top %user processes.
Switch to the Process/Thread page to view the detailed data of the top 50 %user processes.
For each problem listed in Table 5, you can click the node to view problem details and click the next-level node to view the tuning suggestions.
- If the CPU metric is set to %iowait and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
Figure 12 Topology of system performance tuning suggestions (%iowait)
Table 6 describes the parameters in the Threshold Setting area.
Table 6 Parameter description Parameter
Description
iowait
Percentage of the time when the CPU is idle.
%util
Percentage of CPU time during which the CPU is idle and waiting for drive I/O operations.
Read rate
Amount of data read per second.
Write rate
Amount of data written per second.
Throughput
Amount of data (measured in bits, bytes, groups, or the like) successfully transmitted per unit time to a network, device, port, virtual circuit, or other facilities.
Latency
Time it takes for a packet or group to be transmitted from one end of a network to another.
Storage IOPS
IOPS is a critical drive performance metric. It indicates the number of I/O requests that the system can process per unit time. I/O requests are usually data read or write requests.
Percentage of the free space in the memory (%)
When the memory usage is too high, the server performance deteriorates. In this case, you need to increase the percentage of the free space in the memory.
Table 7 describes the main nodes in the tuning suggestion topology.
Table 7 Node description Faulty Node
Description
Storage I/O: high %util
Percentage of CPU time during which the CPU is idle and waiting for drive I/O operations.
If the I/O wait usage is too high, the I/O operation efficiency of some programs is low, or the performance of the device corresponding to the I/Os is low. As a result, the read and write operations take a long time.
Low throughput and high latency
Drive I/O traffic per second, that is, the size of data written to and read from the drive.
Low storage IOPS
IOPS is a critical drive performance metric. It indicates the number of I/O requests that the system can process per unit time. I/O requests are usually data read or write requests.
In the code, the asynchronous read/write I/O interface (for example, the libaio interface) is called.
For drive files, file reading is synchronous. As a result, when a thread reads files, the thread is blocked. To improve performance and drive throughput, the program creates several independent drive read/write threads and uses mechanisms such as semaphore to implement inter-thread communication (with locks). Too many threads and locks lead to more resource preemption, which deteriorates system performance.
Percentage of the free space in the memory
When the memory usage is too high, the server performance deteriorates. In this case, you need to increase the percentage of the free space in the memory.
In the process/thread performance, check the processes with high storage I/O operations and reduce the read and write operations.
Analyze processes with frequent storage I/O operations and reduce read and write operations.
For each problem listed in Table 7, you can click the node to view problem details and click the next-level node to view the tuning suggestions.
- If the CPU metric is set to %irq and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
Figure 13 Topology of system performance tuning suggestions (irq)
Table 8 describes the parameters in the Threshold Setting area.
Table 8 Parameter description Parameter
Description
irq
Percentage of CPU time spent on service hardware interrupts.
IOPS
IOPS is a critical drive performance metric. It indicates the number of I/O requests that the system can process per unit time. I/O requests are usually data read or write requests.
Table 9 describes the main nodes in the tuning suggestion topology.
Table 9 Node description Faulty Node
Description
Network I/O: high network IOPS or traffic
IOPS indicates I/O requests the system can process per unit of time, usually, per second. I/O requests typically mean data read or write requests.
For each problem listed in Table 9, you can click the node to view problem details and click the next-level node to view the tuning suggestions.
- If the CPU metric is set to %soft and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
Figure 14 Topology of system performance tuning suggestions (%soft)
Table 10 describes the parameters in the Threshold Setting area.
Table 10 Parameter description Parameter
Description
soft
Percentage of CPU time spent on service software interrupts.
ksoftirq/<cpu> usage
CPU usage of the ksoftirq process.
NET_TX/NET_RX
Network transmit interruption/Network receive interruption.
IOPS
IOPS is a critical drive performance metric. It indicates the number of I/O requests that the system can process per unit time. I/O requests are usually data read or write requests.
Table 11 describes the main nodes in the tuning suggestion topology.
Table 11 Node description Faulty Node
Description
Software interrupt: high usage of the ksoftirq/<cpu>
The ksoftirqd process in the kernel is responsible for processing software interrupts. After receiving a software interrupt, the ksoftirqd process calls the processing functions corresponding to the software interrupt. For the software interrupt thrown by the NIC driver module, the final processing logic of ksoftirqd is to convert the data packets written by the NIC to the memory into the SKB format that can be identified by the kernel network module, and then send the data to the protocol stack for processing.
High network IOPS or traffic
The volume of data sent or received through a network, channel, or interface in a unit time is too high.
For each problem listed in Table 11, you can click the node to view problem details and click the next-level node to view the tuning suggestions.
- If the CPU metric is set to %idle and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
Figure 15 Topology of system performance tuning suggestions (%idle)
Table 12 describes the parameters in the Threshold Setting area.
Table 12 Parameter description Parameter
Description
idle
Percentage of CPU time during which the CPU is idle and the system has no unfinished storage I/O request.
Network sending and receiving throughput (%)
The volume of data sent or received through a network, channel, or interface in a unit time is too high.
net.ipv4.tcp_wmem
Size of the write buffer. The three values are the minimum value, default value, and maximum value.
net.ipv4.tcp_rmem
Size of the read buffer. The three values are the minimum value, default value, and maximum value.
net.core.wmem_max
Maximum size (in bytes) of the socket transmit buffer.
net.core.rmem_max
Maximum size of the socket receive buffer.
net.core.somaxconn
Maximum number of clients that can process data in the server, that is, the maximum number of connections.
net.ipv4.tcp_max_syn_backlog
tcp_max_syn_backlog indicates the maximum number of clients that can receive SYN (synchronization) packets.
Table 13 describes the main nodes in the tuning suggestion topology.
Table 13 Node description Faulty Node
Description
Low network sending and receiving throughput
The network sending and receiving throughput is low and the number of received and sent packets are small.
Low concurrency
The number of CPU cores that work at the same time is small, and the efficiency is low. Increasing the number of running cores can improve the application execution efficiency.
For each problem listed in Table 13, you can click the node to view problem details and click the next-level node to view the tuning suggestions.
- If the CPU metric is set to %sys and there are CPU cores in the Busy state, you can set the thresholds of related parameters and view details on the right of the page. The topology of system performance tuning suggestions is displayed in the lower part of the page.
- Perform tuning setting based on the tuning suggestion topology.
- View the tuning suggestion topology tree on the analysis result page. Click
and
to select the corresponding tuning suggestions based on the configuration conditions.
Figure 16 Tuning suggestions
- View the Relevant Configurations, Indicator Description, Optimization Suggestion, and Optimization Guide on the right. Click
to adopt the tuning suggestion, or
to cancel the adoption of the tuning suggestion.
Figure 17 Tuning suggestion page
- The adopted tuning suggestions are saved in the associated report. Click associated report in the lower right corner of the page to access the associated report page.
All adopted tuning suggestions are displayed on the associated report page. You can click the task name to view the details. Click Valid or Invalid in the lower left corner to report whether the tuning suggestion meets the expectation.
Figure 18 Associated report
- View the tuning suggestion topology tree on the analysis result page. Click
- Click
on the right of the system performance analysis result tab page to view system performance data.Figure 19 Viewing system performance data