Kunpeng BoostKit for SRA
Getting Started
- What's new
Provides the latest updates in documents of Kunpeng BoostKit for SRA.
- Technical white paper
Describes the solution architecture, advantages, and key features of Kunpeng BoostKit for SRA.
- List of Fixed Vulnerabilities
Provides the list of fixed vulnerabilities in open-source and third-party software involved in the Kunpeng BoostKit software packages.
Acceleration
- Kunpeng Recall Algorithm Library
It optimizes the instruction set architecture and memory access mechanism of the Kunpeng processor at the bottom layer, improving the computing efficiency and throughput of the recall algorithm. It is especially suitable for high-concurrency recall scenarios.
- Kunpeng Inference Acceleration Kit
The Kunpeng Inference Acceleration Kit includes the Kunpeng TensorFlow operator library (KTFOP) and the Kunpeng ONNX Runtime operator library (KONNX).
- Kunpeng AI Library
The Kunpeng Artificial Intelligence Library (KAIL) is a high-performance AI operator library optimized for the Kunpeng platform. It includes a deep neural network operator library (KDNN) and an extension operator library (KDNN_EXT).
- Kunpeng Retrieval Library
This library is optimized for the Kunpeng platform to accelerate vector retrieval. It optimizes the instruction set architecture and memory access mechanism of the Kunpeng processor at the bottom layer. By combining low-precision quantization with high-precision reranking, the library significantly improves the computational efficiency and throughput of recall algorithms without compromising accuracy. These optimizations make it suitable for high-concurrency recall scenarios.
- TensorFlow Serving Thread Scheduling Optimization
Kunpeng BoostKit developed a thread scheduling optimization solution to enhance TensorFlow Serving inference performance.
- TensorFlow Serving ANNC Feature
An extended acceleration suite. It is built on open source OpenXLA, and hosted in the ANNC repository maintained by the openEuler community. The suite includes optimizations tailored for the Kunpeng platform, such as TensorFlow graph fusion, Accelerated Linear Algebra (XLA) graph fusion, and operator optimization.
Open Source Enablement
- oneDNN
Guide for porting the oneDNN deep neural network library.
- PyTorch
Guide for porting the PyTorch open-source deep learning framework.
- TensorFlow
Guide for porting the TensorFlow deep learning framework.
- TensorFlow Serving
Guide for porting TensorFlow Serving, a high-performance system for serving machine learning models.
- ScaNN
Guide for porting ScaNN, an open-source vector similarity search library.
- DLRM
Guide for porting the DLRM deep learning recommendation model.
- TVM
Guide for porting TVM, an open-source deep learning compiler stack.
- ONNX Runtime
Guide for porting ONNX Runtime, a high-performance cross-platform engine for accelerating model inference in the ONNX format.
Performance Evaluation
- Inference Performance Benchmark Testing for Search and Recommendation Ranking Models
This document describes how to deploy a benchmarking system to measure the inference performance of search and recommendation models, covering server and client environment setup and performance evaluation during the inference phase.