Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction

The best practices described in this document demonstrate the functions of each tool of the Kunpeng DevKit. You can refer to these best practices when analyzing and optimizing your development projects in the Kunpeng DevKit. See Table 1.

Table 1 Best practices

Tool

Feature

Working Mode

Best Practice

Description

System Profiler

Kunpeng Performance Boundary Analyzer

Microarchitecture analysis

CLI

Practice 1: Microarchitecture Analysis

In this practice, the Kunpeng Performance Boundary Analyzer is used to quickly identify performance issues. Preliminary analysis indicates a high branch misprediction rate among microarchitecture metrics, suggesting the presence of a performance bottleneck. The System Profiler is then used to analyze the microarchitecture. The CPU branch misprediction rate for conditional statements is found to be high. Examination of the source code reveals that data is not processed before these conditional statements are executed. To address this problem, the data is sorted in the source code, optimizing CPU branch prediction, increasing the branch prediction success rate, and improving overall application performance.

Hotspot function analysis

CLI

Practice 2: Hotspot Function Analysis

In this practice, the Kunpeng Performance Boundary Analyzer is used to quickly identify performance issues. It is preliminarily identified that hotspot system functions are frequently invoked, indicating a performance bottleneck. Then, the System Profiler is used to analyze the hotspot functions, and a flame graph is used to examine the call stack. The analysis shows that I/O system calls account for a high proportion of execution time. Source code inspection further indicates that this issue is caused by high overhead from frequent read system calls. Next, the memory mapping (mmap) method is used to reduce data copies and system calls, optimizing the large-file read logic and thereby reducing I/O latency and improving program performance.

Memory access statistics analysis

CLI

Practice 3: Memory Access Statistics Analysis

In this practice, the Kunpeng Performance Boundary Analyzer is used to quickly identify performance issues. Preliminary analysis indicates that DDRC read bandwidth is extremely high, suggesting a performance bottleneck. The System Profiler is then used to examine memory access statistics and cache miss events, revealing a low cache hit ratio in the application. Further analysis of the source code shows that this issue is caused by extensive memory data replication. Block processing is applied to increase the cache hit ratio and enhance program performance.

System Profiler

Hotspot function analysis

WebUI

Practice 1: Tuning Python String Concatenation

Use the System Profiler to perform hotspot function analysis on Python string concatenation APIs, locate performance bottlenecks, and tune the performance of Python string concatenation based on the analysis result.

Hotspot function analysis

Lock and wait analysis

WebUI

Practice 2: Tuning Lock Performance

Use the hotspot function analysis and lock and wait analysis functions of the System Profiler to sample and analyze multi-thread applications in the target environment, locate performance bottlenecks, and tune the application performance based on the analysis result.

Java Profiler

Real-time profiling

Sampling profiling

WebUI

Practice 1: Tuning Memory Leaks

Use the Java Profiler to perform real-time profiling and sampling profiling on a running Java program, locate program problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

WebUI

Practice 2: Locating and Tuning Hotspot Functions

Use the Java Profiler to perform hotspot function analysis on a running Java program, locate program hotspot problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

Real-time profiling

WebUI

Practice 3: Tuning GC Logs

Use the Java Profiler to perform garbage collection (GC) analysis on a running Java program, locate program GC problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

WebUI

Practice 4: Locating and Tuning Lock Contention

Use the Java Profiler to perform thread dump (with CPU analysis) on a running Java program, locate program lock problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

WebUI

Practice 5: Tuning GC Triggered by G1 Humongous Objects

Use the Java Profiler to perform GC analysis on a running Java program, locate program GC problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

WebUI

Practice 6: Tuning Abnormal GC

Use the Java Profiler to perform GC analysis on a running Java program, locate program GC problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

WebUI

Practice 7: Tuning Deadlocks

Use the Java Profiler to perform thread dump (with CPU analysis) on a running Java program, locate program lock problems, and tune the program based on the analysis results to achieve optimal running of the Java program.

System Diagnosis

Memory usage

WebUI

Practice 1: Tuning Memory Usage

Use the Kunpeng DevKit System Diagnosis tool to analyze the memory usage of executable programs, locate memory leaks based on the call stack information, and optimize memory usage accordingly.