Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction

The best practices described in this document demonstrate the functions of each tool of the Kunpeng DevKit CLI. You can refer to these best practices when analyzing and optimizing your development projects in the Kunpeng DevKit CLI. See Table 1.

Table 1 Best practices

Tool

Feature

Best Practice

Description

Porting Advisor

Source Code Porting

Practice 1: Porting Open Source Software smartdenovo-master

SMARTdenovo is a de novo sequence assembler for PacBio or Oxford Nanopore. It is open source software written in the C language.

Use the Kunpeng DevKit Porting Advisor to analyze the SmartDenovo source package, helping to port applications.

Software Porting Assessment

Practice 1: Scanning and Analyzing netty-all-4.1.34-Final

Netty is an NIO-based client and server programming framework.

Use the Kunpeng DevKit Porting Advisor for assessment before porting the Netty software package.

System Profiler

Microarchitecture Analysis

Practice 1: Microarchitecture Analysis

In this practice, the Kunpeng Performance Boundary Analyzer is used to quickly identify performance issues. Preliminary analysis indicates a high branch prediction failure rate among microarchitecture metrics, suggesting the presence of a performance bottleneck. The System Profiler is then used to analyze the microarchitecture. The CPU branch prediction failure rate for conditional statements is found to be high. Examination of the source code reveals that data is not processed before these conditional statements are executed. To address this, the data is sorted in the source code, optimizing CPU branch prediction, increasing the branch prediction success rate, and improving overall application performance.

Hotspot Function Analysis

Practice 1: Hotspot Function Analysis

In this practice, the Kunpeng Performance Boundary Analyzer is used to quickly identify performance issues. It is preliminarily identified that hotspot system functions are frequently invoked, indicating a performance bottleneck. Then, the System Profiler is used to analyze the hotspot functions, and a flame graph is used to examine the call stack. The analysis shows that I/O system calls account for a high proportion of execution time. Source code inspection further indicates that this issue is caused by high overhead from frequent read system calls. Next, the memory mapping (mmap) method is used to reduce data copies and system calls, optimizing the large-file read logic and thereby reducing I/O latency and improving program performance.

Memory Access Statistics Analysis

Practice 1: Memory Access Statistics Analysis

In this practice, the Kunpeng Performance Boundary Analyzer is used to quickly identify performance issues. Preliminary analysis indicates that DDRC read bandwidth is extremely high, suggesting a performance bottleneck. The System Profiler is then used to examine memory access statistics and cache miss events, revealing a low cache hit ratio in the application. Further analysis of the source code shows that this issue is caused by extensive memory data replication. Block processing is applied to increase the cache hit ratio and enhance program performance.