我要评分
获取效率
正确性
完整性
易理解

Overview

The code samples in Table 1 described in this document demonstrate the functions of each tool of the Kunpeng DevKit. You can refer to these code samples when analyzing and optimizing your development projects in the Kunpeng DevKit.

Table 1 Introduction to code samples

Tool

Scenario

Description

Sample Code

Porting Advisor

Sample 1: Source Code Scan

The Kunpeng Porting Advisor scans the C/C++/Fortran/assembly source code of the x86 platform software. It identifies the SO dependency in the source code, scans the code lines that need to be modified, and provides modification suggestions. It calculates the estimated workload based on the code modification efficiency set by the system, for the leadership to make project decisions based on the estimation. This function is under the first-level menu Source Code Porting. It is available in both the x86 and Kunpeng environments.

NOTE:

Do not rescan the assembly source code after porting and modification. A rescan may cause inaccurate analysis results.

Makefile

file_lock.c

file_lock.h

ksw.c

ksw.h

interface.s

Sample 2: Inline Assembly Translation (single-instruction and multi-instruction conversions)

The tool supports the inline assembly function of the assembly translation module. This sample explains how to scan the C/C++ source code of x86-based software, identifies the inline assembly code in the source code, and provides suggestions for adapting the inline assembly code to the Kunpeng platform.

swap.c

gcd.c

Sample 3: Full Assembly Translation

The tool supports the full assembly function of the assembly translation module. This sample explains how to scan the source code of x86-based software, identifies the full assembly code in the source code, and provides suggestions for adapting the full assembly code to the Kunpeng platform.

test.s

Makefile

main.c

System Profiler

Sample 1: Matrix Analysis

The Kunpeng DevKit System Profiler is used to tune the program for calculating the one-dimensional matrix based on the for loop. In this sample, the hotspot function analysis is performed to identify the hotspot function multiply for matrix calculation. Then, NEON instructions are used to tune the program, and the tuning effects are compared.

multiply.c, multiply_simd.c, multiply_start.sh

Sample 2: Detecting and Tuning Column-wise Access Loops

The hotspot function analysis function of the Kunpeng DevKit System Profiler is used to compare the analysis results of miss events accessed by row and by column based on the two-dimensional array loop traversal program. The analysis result indicates that row-wise access can increase the CPU cache hit efficiency.

cache_hit.c, cache_miss.c, miss_start.sh, hit_start.sh

Sample 3: Frequent Lock Preemption

Lock preemption and contention frequently occur for multi-thread programs, causing waste of CPU resources. Generally, the public resource contention can be addressed by analyzing and simplifying the service logic. In this sample, the resource scheduling analysis and lock & wait analysis functions of the Kunpeng DevKit System Profiler are used to analyze the service logic. You can reduce the lock size and the number of concurrent threads to reduce lock contention.

pthread_mutex.c, pthread_atomic.c

Sample 4: MPI Application Analysis

The HPC application analysis function of the Kunpeng DevKit System Profiler helps you learn about the communication status of the application in each rank.

ring.c

Sample 5: Long Application Execution Caused by MPI Blocking Communication Functions

In an MPI/OpenMP hybrid scenario, you can use the HPC application analysis function of the Kunpeng DevKit System Profiler to understand how to tune application performance in each scenario.

send_recv.cpp

Sample 6: NUMA Refined Analysis

In the non-uniform memory access (NUMA) architecture, the Kunpeng DevKit System Profiler can be used to perform NUMA refined analysis. It collects the NUMA performance of all processes in the system and identifies top N (top 10 for example) processes with the poorest NUMA performance. It generates statistics matrix about memory access between NUMA nodes and identifies unbalanced memory access between nodes, based on which tuning suggestions are provided.

None