Rate This Document
Findability
Accuracy
Completeness
Readability

Overview

The code samples in Table 1 described in this document demonstrate the functions of each tool of the Kunpeng DevKit. You can refer to these code samples when analyzing and optimizing your development projects in the Kunpeng DevKit.

Table 1 Introduction to code samples

Tool

Working Mode

Scenario

Description

Sample Code

System Profiler

CLI

Sample 1: Using Roofline Analysis to Tune Applications

For the same type of applications, you can use the roofline analysis function of the Kunpeng DevKit System Profiler to tune the roofline analysis task level by level in multiple dimensions, and therefore understand how to perform a roofline analysis task.

matrix.h

matrix.c

matmult.h

main.c

intrinsic_matmult.c

block_matmult.c

base_matmult.c

WebUI

Sample 1: Matrix Analysis

The Kunpeng DevKit System Profiler is used to tune the program for calculating the one-dimensional matrix based on the for loop. In this sample, the hotspot function analysis is performed to identify the hotspot function multiply for matrix calculation. Then, NEON instructions are used to tune the program, and the tuning effects are compared.

multiply.c, multiply_simd.c, multiply_start.sh

WebUI

Sample 2: Detecting and Tuning Column-wise Access Loops

The hotspot function analysis function of the Kunpeng DevKit System Profiler is used to compare the analysis results of miss events accessed by row and by column based on the two-dimensional array loop traversal program. The analysis result indicates that row-wise access can increase the CPU cache hit efficiency.

cache_hit.c, cache_miss.c, miss_start.sh, hit_start.sh

WebUI

Sample 3: Frequent Lock Preemption

Lock preemption and contention frequently occur for multi-thread programs, causing waste of CPU resources. Generally, the public resource contention can be addressed by analyzing and simplifying the service logic. In this sample, the resource scheduling analysis and lock & wait analysis functions of the Kunpeng DevKit System Profiler are used to analyze the service logic. You can reduce the lock size and the number of concurrent threads to reduce lock contention.

pthread_mutex.c, pthread_atomic.c

WebUI

Sample 4: MPI Application Analysis

The HPC application analysis function of the Kunpeng DevKit System Profiler helps you learn about the communication status of the application in each rank.

ring.c

WebUI

Sample 5: Long Application Execution Caused by MPI Blocking Communication Functions

In an MPI/OpenMP hybrid scenario, you can use the HPC application analysis function of the Kunpeng DevKit System Profiler to understand how to tune application performance in each scenario.

send_recv.cpp

WebUI

Sample 6: NUMA Refined Analysis

In the non-uniform memory access (NUMA) architecture, the Kunpeng DevKit System Profiler can be used to perform NUMA refined analysis. It collects the NUMA performance of all processes in the system and identifies top N (top 10 for example) processes with the poorest NUMA performance. It generates statistics matrix about memory access between NUMA nodes and identifies unbalanced memory access between nodes, based on which tuning suggestions are provided.

None