Introduction
This document provides the installation guide, interface definitions, and sample code of SRA_Inference to help you quickly get started with it.
SRA_Inference is an inference acceleration kit provided by Huawei and optimized based on the Kunpeng platform. It implements efficient adaptation and inference acceleration of mainstream frameworks such as TensorFlow and ONNX Runtime on the Kunpeng platform.
SRA_Inference Overview
Table 1 describes the composition of SRA_Inference.
Component |
Description |
Application Scenario |
|---|---|---|
KTFOP |
Kunpeng TensorFlow Operator (KTFOP) is an efficient, Huawei-developed TensorFlow operator library. It uses single instruction multiple data (SIMD) instructions and multi-core scheduling to accelerate operator processing in CPUs and reduce the usage of CPU computing resources, thereby increasing the overall end-to-end throughput of online inference. |
Inference computing tasks on TensorFlow |
KONNX |
Kunpeng ONNX Runtime (KONNX) is an efficient, Huawei-developed ONNX Runtime operator library. By employing optimization techniques like matrix multiplication (pack) and vectorization, it enhances CPU-side operator performance while minimizing computing resource consumption, ultimately reducing end-to-end inference latency. |
Inference computing tasks based on ONNX Runtime |
SRA_Inference is available only for Kunpeng series processors.
- Kunpeng 920 7260 (128 cores), supporting NEON instructions (128-bit width)
- New Kunpeng 920 processor model, supporting NEON instructions (128-bit width) and Scalable Vector Extension (SVE) instructions (256-bit width)
Application Scenarios
SRA_Inference is suited for the following scenarios:
- Recommendation: recommendation systems
- Advertising: advertisement placements