Basic Acceleration with the Kunpeng BoostKit
The Kunpeng BoostKit Library is optimized based on Arm instructions and developed based on the Kunpeng Accelerator Engine (KAE). It covers the system library, math library, compression, encryption & decryption, media, storage, network, and AI libraries, and provides high-performance acceleration capabilities for application scenarios such as big data encryption and decryption, distributed storage and compression, and video transcoding.
System Library
The Kunpeng BoostKit system library consists of the following components:
- Kpglibc: The Kunpeng GNU C Library (Kpglibc) uses the Kunpeng vectorized instruction set to optimize the performance of string, memory, and time operation functions. It provides optimized operators, including memcmp, memset, memcpy, memrchr, strcpy, strcmp, gettimeofday, and clock_gettime.
- HyperScan: It is a high-performance regular expression matching library, with an additional Kunpeng platform branch that is fully compatible with Armv8-A. It boosts application performance on the Kunpeng platform by leveraging a series of methods including NEON instruction, inline assembly, data alignment, instruction alignment, memory prefetch, static branch prediction, and adjusted code structure.
- AVX2KI: It is an interface collection library that re-implements the intrinsic interface collection on a conventional platform using Kunpeng instructions and encapsulates the intrinsic interface collection as an independent interface module (in C language header file mode) to reduce the workload of repeated development of porting projects.
- KQMalloc: The Kunpeng Quick Malloc library (KQMalloc) is a memory allocator designed for the Kunpeng 920 processor. The allocator has two versions, one for single-threaded applications and the other for multi-threaded applications. It minimizes internal cache usage and cache misses to dramatically improve application performance.
- HTL: The Hyper Thread Library (HTL) is a user-level thread library built on kernel-mode threads. It solves application performance deterioration and system resource insufficiency when a large number of kernel-mode threads are used, especially when the performance is extremely low in nested parallel scenarios. This library improves concurrency and performance while reducing resource usage.
- KSL_ASN1: Abstract Syntax Notation One (ASN.1) defines a formalism for the specification of abstract data types. This notation is used to flexibly describe data representation, encoding, transmission, and decoding. The KSL_ASN1 library is optimized based on open source ASN.1 software for the Kunpeng platform and has higher performance compared with asn1c. KSL_ASN1 supports the Distinguished Encoding Rules (DER), XML Encoding Rules (XER), Packed Encoding Rules (PER), and Basic Encoding Rules (BER).
- HAF: The Homogeneous Acceleration Framework (HAF) provides user-friendly programming methods and application programming interfaces (APIs) to quickly, effectively, and securely offload and push specified acceleration segments of your service programs to offload nodes, optimizing the offload effect.
- BiSheng JDK acceleration library
BiSheng JDK is an open source Huawei JDK distribution developed on OpenJDK. It runs on Kunpeng processors to offer acceleration features, including enhanced heap dump, JBooster, and JBolt.
- Enhanced heap dump masks sensitive information in heap dump files to protect data security and privacy.
- JBooster accelerates application startup, reduces CPU usage, speeds up elastic scaling, and cuts cloud application deployment costs.
- JBolt optimizes the code cache layout to reduce the iCache/iTLB miss rate and improve application performance.
The enhanced heap dump feature supports BiSheng JDK 8 and 17, and the JBooster and JBolt features support only BiSheng JDK 17.
Compression
The Kunpeng BoostKit compression library consists of the following components:
- Gzip: It uses data prefetch, loop unrolling, and CRC instruction replacement to improve the compression and decompression rates on the Kunpeng platform, especially the compression and decompression of text files.
- Zstandard (zstd): It is a fast compression algorithm. It uses NEON instructions, inline assembly, memory prefetch, adjusted code structure, optimized instruction pipeline layout, and zstar software optimization library to increase the compression and decompression rates on the Kunpeng platform.
- Snappy: It uses inline assembly, high-bit instructions, optimized CPU pipeline, and memory prefetch to increase the compression and decompression rates on the Kunpeng platform.
- KAEZlib: It is the compression module of the KAE. It uses the Kunpeng hardware acceleration module to implement the Deflate algorithm and works with the lossless user-mode driver framework to provide an interface for high-performance compression in gzip or zlib format.
- LZ4: It is a library used for ultrafast data compression and decompression. This compression algorithm is suitable for various types of data. This fast and reliable data compression solution has been used in a wide range of applications including data storage, network transmission, and real-time data processing. The LZ4 1.9.3 formal release uses methods such as NEON instructions, inline assembly, code structure adjustment, memory prefetch, and optimized instruction pipeline layout to turbocharge the performance of LZ4 on the Kunpeng computing platform.
Encryption and Decryption
The KAE encryption and decryption module uses the Kunpeng hardware acceleration engine to implement the
Media
The Kunpeng BoostKit media library consists of the following components:
- HMPP: Kunpeng Hyper Media Performance Primitives (HMPP) provides functions for allocating and releasing vector buffers, vector initialization functions, vector mathematical operation functions, vector statistics operation functions, vector sampling functions, vector conversion functions, filtering functions, and transform (such as fast Fourier transform) functions. It complies with IEEE 754, which is a technical standard for floating-point arithmetic.
- x265: It uses Kunpeng vector instructions to accelerate the underlying x265 transcoding operators, improving the overall application performance in FFmpeg video transcoding scenarios.
- x264: It is a free video encoding software authorized by GPL. It is mainly used for H.264/MPEG-4 AVC video encoding.
- HW265Enc: It is a video encoder developed by Huawei based on the H.265/HEVC (High Efficiency Video Coding) standard for video on demand and live streaming. It encodes YUV pixel files to generate H.265/HEVC video bitstream files, and supports 8-bit color depth and 420p format. In addition, the encoder is optimized for the new Kunpeng 920 processor model. It has distinct performance advantages over the open source x265 encoder, according to comparison tests with the x265 (commit ce8642f) version.
- KVSIP: The Kunpeng vector signal processing library (KVSIP) provides high-performance computing interfaces, including basic vector computing, basic matrix computing, and fast Fourier transform.
- KPCV: The Kunpeng Computer Vision Library (KPCV) leverages Kunpeng vector instructions to optimize OpenCV and PyTorch image operators.
- CV image operator library: Kunpeng vector instructions are used to optimize image processing operators in this library, including merge, resize (it supports multiple interpolation modes, such as cv2.INTER_LINEAR, cv2.INTER_CUBIC, and cv2.INTER_NEAREST_EXACT), remap, dilate, GaussianBlur, and cvtColor.
- PyTorch image operator library: Kunpeng vector instructions are used to optimize data preprocessing operators in this library, including normalize, resize, and permute.
Math
The Kunpeng Math Library (KML) provides high-performance mathematical functions which are optimized based on the Kunpeng platform. It mainly performs scalar, vector, and matrix computation, covering four arithmetic operations and trigonometric, hyperbolic, exponential, and logarithmic functions. All APIs of the KML are implemented by C/C++ and assembly languages. Some of the APIs are compatible with Fortran and some are encapsulated using Java.
Storage
The Kunpeng BoostKit storage library consists of the following components:
- Smart prefetch: It uses high-speed cache drives and efficient prefetch algorithms to improve system storage I/O performance and the overall performance of the solution in storage I/O-intensive scenarios.
- SPDK: The Storage Performance Development Kit (SPDK) uses network, processing, and storage technologies to improve application efficiency and performance. By running software designed for hardware, SPDK has proven capable of millions of I/O reads per second by using multiple processor cores and NVMe drivers for storage, without the need for offload hardware.
- ISA-L: The Intelligent Storage Acceleration Library (ISA-L) is a collection of highly optimized functions that provide RAID, erasure code (EC), cycle redundancy check, cryptographic hash, and compression.
Network
The Kunpeng BoostKit network library consists of the following components:
- XPF: The Extensible Packet Framework (XPF) library is a Huawei-developed function module, which implements an intelligent offload engine module in the Open vSwitch (OVS) software. This module is used to trace all flow tables and conntrack tables of data packets in the OVS software. It comprehensively orchestrates the executed conntrack behaviors and all flow table behavior entries into a comprehensive behavior entry, and generates an integrated flow table entry with reference to the unified match entry. After subsequent data packets enter OVS, if the packets match the integrated flow table, the comprehensive behavior is directly executed. Compared with the open source processing flow, the number of queries is reduced, and the performance is improved.
- DPDK: The Data Plane Development Kit (DPDK) is a data-plane development tool set, including library functions and drivers, for efficient data packet processing in the user space.
AI Library
- KAIL_DNN: Based on the microarchitecture features of the Kunpeng processor, KAIL_DNN improves the performance of core DNN operators through vectorization, assembly, and algorithm optimization, and can be integrated into open source oneDNN as a plugin to provide complete capabilities.
- KAIL_DNN_EXT: It serves as the extension library of KAIL_DNN. KAIL_DNN_EXT optimizes operators such as softmax and random_choice, and encapsulate them into a Python interface library for specific AI scenarios.