我要评分
获取效率
正确性
完整性
易理解

Introduction

The System Profiler is a performance analysis tool for Kunpeng-powered servers. It collects performance data of processor hardware, operating system (OS), processes, threads, and functions, analyzes system performance metrics, locates system bottlenecks and hotspot functions, and provides tuning suggestions. This tool helps quickly locate and handle software performance problems.

Table 1 Task description

Task Type

Task Subtype

Description

Supported Platform

General analysis

Hotspot function analysis

The tool analyzes C/C++ program code, identifies performance bottlenecks, and displays hotspot functions. It also displays the function call relationship in flame graphs and provides the tuning path.

Kunpeng

System component analysis

NUMA refined analysis

This analysis is based on the Arm Statistical Profiling Extension (SPE) capability. SPE samples instructions and records information about triggered events, including accurate PC pointer information. The tool leverages the SPE capability to collect the NUMA performance of all processes in the system, find the top N (for example, N = 10) processes with the poorest NUMA performance and the hotspot memory areas of these processes, and identify the inter-NUMA node memory access statistics matrix and the inter-node memory access imbalance status. Then related tuning suggestions are provided.

I/O analysis

The tool analyzes the storage I/O performance. By analyzing block storage devices, the tool obtains performance data such as the number of I/O operations, I/O data size, I/O queue depth, and I/O operation delay, and identifies specific I/O operations, processes, threads, call stacks, and I/O APIs in the application layer. Based on the I/O performance data, the tool provides tuning suggestions.

Dedicated analysis

Lock and wait analysis

The tool analyzes the lock and wait functions (including sleep, usleep, mutex, cond, spinlock, rwlock, and semaphore) of glibc and open source software, such as MySQL and OpenMP, associates the processes and call sites to which the lock and wait functions belong, and provides tuning suggestions based on existing experience.

HPC application analysis

The tool collects Performance Monitor Unit (PMU) events of the system and the key metrics of MPI and MPI+OpenMP applications to help accurately obtain the serial and parallel time of the parallel region and barrier-to-barrier, calibrated 2-layer microarchitecture metrics, instruction distribution, L3 usage, and memory bandwidth.

Comparison analysis

-

For the same type of analysis tasks, you can select the same node or different nodes to compare the analysis results. In this way, you can quickly learn the differences between different analysis results, locate performance metric changes, and identify the effect of optimization methods.

Use Restrictions

Table 2 Use restrictions

Task Type

Task Subtype

Description

System component analysis

NUMA refined analysis

This function is available on openEuler and CentOS 7.6 with the Statistical Profiling Extension (SPE) feature. The supported openEuler kernel versions are 4.19 and later and the supported CentOS 7.6 kernel versions are 4.14.0-115.el7a.0.1, 4.14.0-115.2.2.el7a, 4.14.0-115.5.1.el7a, 4.14.0-115.6.1.el7a, 4.14.0-115.7.1.el7a, 4.14.0-115.8.2.el7a, and 4.14.0-115.10.1.el7a. This function is unavailable on VMs.

I/O analysis

The system kernel supports ftrace collection.

Dedicated analysis

HPC application analysis

During OpenMP data collection, the kernel parameters /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid are enabled to collect call graph data and PMU events. After the collection is complete, the two kernel parameters are restored to their original values.

Lock and wait analysis

The environment must support the extended Berkeley Packet Filter (eBPF) configuration.

Comparative analysis

-

Hotspot function analysis is supported.