Introduction

This document provides the installation guide, interface definitions, and sample code of SRA_Inference to help you quickly get started with it.

SRA_Inference is an inference acceleration kit provided by Huawei and optimized based on the Kunpeng platform. It implements efficient adaptation and inference acceleration of mainstream frameworks such as TensorFlow and ONNX Runtime on the Kunpeng platform.

SRA_Inference Overview

Table 1 describes the composition of SRA_Inference.

**Table 1** SRA_Inference composition
Component	Description	Application Scenario
KTFOP	Kunpeng TensorFlow Operator (KTFOP) is an efficient, Huawei-developed TensorFlow operator library. It uses single instruction multiple data (SIMD) instructions and multi-core scheduling to accelerate operator processing in CPUs and reduce the usage of CPU computing resources, thereby increasing the overall end-to-end throughput of online inference.	Inference computing tasks on TensorFlow
KONNX	Kunpeng ONNX Runtime (KONNX) is an efficient, Huawei-developed ONNX Runtime operator library. By employing optimization techniques like matrix multiplication (pack) and vectorization, it enhances CPU-side operator performance while minimizing computing resource consumption, ultimately reducing end-to-end inference latency.	Inference computing tasks based on ONNX Runtime

SRA_Inference is available only for Kunpeng series processors.

Kunpeng 920 (128 cores), supporting NEON instructions (128-bit width)
New Kunpeng 920 processor model, supporting NEON instructions (128-bit width) and Scalable Vector Extension (SVE) instructions (256-bit width)

Application Scenarios

SRA_Inference is suited for the following scenarios:

Recommendation: recommendation systems
Advertising: advertisement placements

Parent topic: Developer Guide