我要评分
获取效率
正确性
完整性
易理解

Introduction

Overview

The Kunpeng DevKit command line tool is a toolset that includes the System Migration, Porting Advisor, Affinity Analyzer, Compiler and Debugger, and System Diagnosis tools. This document describes how to obtain, install, and use the Kunpeng DevKit command line tool. The following table lists the supported functions:

Table 1 Functions supported by the Kunpeng DevKit command line tool

Tool

Description

System Migration

Collects information about installed software including software packages, middleware, and databases in an application system, and analyzes the dependency compatibility in the Project Object Model (POM) file of a Maven project.

Porting Advisor

Ports software from x86 servers running Linux to Kunpeng servers running Linux, with necessary software scan and analysis capabilities.

Affinity Analyzer

Checks software code on the Kunpeng 920 platform to improve code quality and memory access performance.

Tracer

Parses the user-defined configuration file (.ini) and generates a header file. Users can write header file APIs into program code to record log information. In addition, the Tracer integrates the CTF deserialization function to parse and output log files.

HPC Debugger

Debugs MPI and MPI+OpenMP applications in Launch mode. It must be run using the mpirun command. Debugs MPI and MPI+OpenMP applications in Attach mode. It must use the srun method of the Slurm scheduler.

System Profiler

Collects and analyzes performance data in multiple scenarios, and provides tuning suggestions based on the tuning system.

Python/C Profiler

Samples Python programs and mixed programs of Python and C/C++ and analyzes call stacks.

Java Profiler

Analyzes and optimizes the performance of Java applications running on Kunpeng servers.

Kunpeng AutoTuner

Automatically tunes application and system parameters to improve application performance metrics in different scenarios.

System Methodology Profiler

Enables one-click collection of multidimensional performance statistics, covering cache misses, memory access, NUMA, microarchitecture, miss latencies, hotspot functions, CPU usage, NIC bandwidth, I/O, memory usage, softirqs, PCIe, PA2Ring, and Ring2PA. It also supports instruction type analysis. The collected data is time-aligned, and resource usage is visually presented from the service layer down to the chip layer.

System Diagnosis

Analyzes exceptions that occur in applications.

Kunpeng Health Inspector

Quickly collects static Kunpeng hardware information to help learn about the overall hardware status before tuning.

NOTE:

The Kunpeng Health Inspector was once named "Kunpeng Health Check."

JVM Jitter Detector

Monitors the code cache and JIT compiler metrics of Java applications, and generates alarms when detecting any abnormal metrics that could cause performance jitter.

  • The System Migration, Porting Advisor, Affinity Analyzer, Tracer, and System Methodology Profiler tools are available on x86 and Kunpeng 920 servers.
  • The HPC Debugger, System Profiler, Python/C Profiler, Java Profiler, Kunpeng AutoTuner, System Diagnosis, JVM Jitter Detector, and Kunpeng Health Inspector are available only on Kunpeng 920 servers.

System Migration

The System Migration tool collects information about installed software including software packages, middleware, and databases in an application system, and analyzes the dependency compatibility in the POM file of a Maven project.
Table 2 Function description

Function

Description

Application information collection for system migration

Collects ledger and component information about the software installed in an application system, such as software packages, middleware, and databases.

Maven project source code analysis

Executes the mvn command to invoke the Maven plugin, parses the dependency compatibility in the POM file, and generates an HTML report.

Porting Advisor

The Porting Advisor simplifies the application porting process and supports scanning, analysis, and porting of software from x86 Linux to Kunpeng Linux. This tool can automatically analyze applications and generate guide reports, greatly improving code porting efficiency.
Table 3 Function description

Function

Description

Source code porting

Analyzes the portability of software written in C/C++/ASM/Fortran/Go/Java/Python/Scala.

  • Checks the C/C++/ASM/Fortran/Go software build project files and provides porting suggestions.
  • Checks the link libraries used by C/C++/Fortran/Go/interpreted language software build project files and provides porting suggestions.
  • Checks the C/C++/ASM/Fortran/Go/interpreted language software source code and provides porting suggestions. The tool supports porting of Fortran source code from the Intel Fortran compiler to the GCC Fortran compiler, and checks of compiler features and syntax extensions.
  • Checks the compatibility of the SO files loaded by the Python/Java/Scala program through the ctypes module.
  • Analyzes some x86 assembly instructions and converts them into equivalent Kunpeng assembly instructions.

Software porting assessment

Analyzes the SO library files in the software installation path in the x86 environment and checks whether these files are compatible with the Kunpeng platform.

Affinity Analyzer

The Affinity Analyzer checks software code to improve code quality and memory access performance. It supports only calculation precision analysis on the x86 platform.
Table 4 Function description

Function

Description

64-bit running mode check

Identifies the 32-bit applications to be ported to the 64-bit platform and provides modification suggestions. It supports GCC 4.8.5 to GCC 10.3.0.

Byte alignment check

Checks the byte alignment of structure variables in the source code.

BC file generation

A BC file is used for memory consistency check and vectorization check.

Static memory consistency check

Checks for any memory inconsistency problems in static mode when C/C++ source code is running on the Kunpeng platform, and also provides check reports and memory barrier insertion suggestions. This function is available only on Kunpeng 920 series processors.

Vectorization check

Checks vectorizable code snippets and provides modification suggestions.

Matricization check

Checks matricizable code snippets and provides modification suggestions.

Build affinity

Analyzes the content in Makefile and CMakeLists.txt that can be replaced with content in the Kunpeng library, and provides replacement suggestions and function repair.

Cache line alignment check

Checks the 128-byte alignment of structure variables in the C/C++ source code to improve memory access performance.

Dynamic memory consistency check

Checks for any memory inconsistency problems in dynamic mode when C/C++ source code is running on the Kunpeng platform, and also provides check reports and memory barrier insertion suggestions. This function is available only on Kunpeng 920 series processors.

Calculation precision analysis

Locates the precision differences of the Fortran, C, and C++ languages caused by x86 and Kunpeng instructions.

Link latency detection

Provides the Kunpeng network detection and analysis functions, collects latency data of network protocol stacks, and analyzes network performance bottlenecks.

Tracer

The Tracer parses the user-defined configuration file (.ini) and generates a header file. Users can write header file APIs into program code to record log information. In addition, the Tracer integrates the CTF deserialization function to parse and output log files.

HPC Debugger

The HPC Debugger debugs MPI and MPI+OpenMP applications in Launch mode. It must be run using the mpirun command. Debugs MPI and MPI+OpenMP applications in Attach mode. It must use the srun method of the Slurm scheduler.

System Profiler

The System Profiler is a performance analysis tool for Kunpeng-powered servers. It collects performance data of processor hardware, operating system (OS), processes/threads, and functions, analyzes system performance metrics, locates system bottlenecks and hotspot functions, and provides tuning suggestions.
Table 5 Function description

Function

Description

Microarchitecture analysis

Obtains the running status of instructions on the CPU pipeline based on Arm performance monitor unit (PMU) events, helping quickly locate performance bottlenecks of the current application on the CPU. You can modify the target program to make full use of hardware resources.

HPC application analysis

Collects PMU events of the system and the key metrics of OpenMP and MPI applications to help you accurately obtain the serial and parallel times of the parallel region and barrier-to-barrier, calibrated L2 microarchitecture metrics, instruction distribution, L3 usage, and memory bandwidth.

Memory access analysis

Accesses the PMU events of the cache and memory and analyzes the number of storage access times, hit rate, and bandwidth.

NUMA refined analysis

Obtains the refined DDR access, NUMA access bandwidth matrix, and processes' memory access information based on Arm SPE capabilities.

Roofline analysis

Helps pinpoint application bottlenecks on a given hardware platform and optimize the application accordingly.

Hotspot function analysis

Analyzes C/C++ program code, identifies performance bottlenecks, and provides details about the top hotspot functions and call stacks. The tool also displays the function call relationship in flame graphs and provides the tuning path.

Miss event analysis

Uses the Statistical Profiling Extension (SPE) capability to analyze miss events such as LLC Miss, TLB Miss, Remote Access, and Long Latency Load. You can modify the target program to reduce the probability of miss events and improve the program processing performance.

Hotspot function analysis (Python/C)

Uses ptrace to sample Python programs and Python & C/C++ hybrid programs, analyzes call stacks, obtains top 20 hotspot functions, and draws flame graphs.

Python/C Profiler

Table 6 Function description

Function

Description

Hotspot function analysis

Uses ptrace to sample Python programs and Python & C/C++ hybrid programs, analyzes call stacks, obtains top 20 hotspot functions, and draws flame graphs.

Java Profiler

The Java Profiler analyzes and optimizes the performance of Java applications running on Kunpeng servers. The tool identifies hotspot functions, locates performance bottlenecks, and provides tuning suggestions.
Table 7 Function description

Function

Description

Hotspot analysis

Collects stack information about CPU, CYCLES, LOCK, CACHE_MISSES, and ALLOC events at certain points of time, collects statistics on hotspot methods in the current JVM, and displays the information in a flame graph and an inverted flame graph.

Kunpeng AutoTuner

The Kunpeng AutoTuner tool automatically tunes application parameters to improve application performance metrics in different scenarios.

Table 8 Function description

Function

Description

Generating a template file

Generates a parameter file based on the application and task parameters users set on the interactive user interface.

Enabling automatic tuning

Automatically tunes performance based on the given parameters and performance test result.

Using the tuning result

Apply the tuning result.

System Methodology Profiler

The tool enables one-click collection of multidimensional performance statistics, covering cache misses, memory access, NUMA, microarchitecture, miss latencies, hotspot functions, CPU usage, NIC bandwidth, I/O, memory usage, softirqs, PCIe, PA2Ring, and Ring2PA. It also supports instruction type analysis. The collected data is time-aligned, and resource usage is visually presented from the service layer down to the chip layer.

System Diagnosis

The System Diagnosis tool analyzes system running metrics of Kunpeng servers to identify abnormalities, such as memory leak and memory overwriting.
Table 9 Function description

Function

Description

Memory usage

Collects performance data about memory allocation and release, and checks whether any allocated memory space has not been released.

Kunpeng Health Inspector

The Kunpeng Health Inspector (KSPECT) is a lightweight and precise tool for collecting static Kunpeng hardware information. It swiftly collects data on server hardware, such as CPUs, memory, network, storage, PCIe, virtual machines (VMs), sensors, software, and module dependencies, and offers performance tuning suggestions based on the collected data.

JVM Jitter Detector

The JVM Jitter Detector tool monitors the code cache and JIT compiler metrics of Java applications, and generates alarms when detecting any abnormal metrics that could cause performance jitter.

Intended Audience

This document is intended for:

  • Kunpeng developers
  • Kunpeng software users
  • Independent software vendor (ISV) developers

Command Format Conventions

Format

Description

Boldface

The keywords (the part that must be kept unchanged) of a command are in boldface.

Italic

Command arguments (replaced by specific values in an actual command) are in italics.

[ ]

Optional items (keywords or arguments) are grouped in square brackets ([]).

{ x | y | ... }

Optional items are grouped in braces ({}) and separated by vertical bars (|). One item must be selected.

[ x | y | ... ]

Optional items are grouped in brackets ([]) and separated by vertical bars (|). One item or no item can be selected.

{ x | y | ... }*

Optional items are grouped in braces ({}) and separated by vertical bars (|). At least one item must be selected, and at most all items can be selected.

[ x | y | ... ]*

Optional items are grouped in brackets ([]) and separated by vertical bars (|). Several items or no item can be selected.

&<1-n>

The parameter before the ampersand (&) can be repeated 1 to n times.

#

A line starting with the # sign is a comment.