Solution Architecture

Figure 1 shows the positioning of Kunpeng BoostKit for SRA components. The acceleration solution and modification policy of each component in Kunpeng BoostKit for SRA are different. Generally, the system architecture of the original base software is not modified. However, the modification policy varies according to the actual requirements. Table 1 describes each component.

Figure 1 Architecture of Kunpeng BoostKit for SRA

**Table 1** Kunpeng BoostKit for SRA components
Algorithm Type	Component	Description
Recall algorithm	KScaNN	Kunpeng Scalable Nearest Neighbors (KScaNN) is an inverted index-based vector retrieval algorithm that deeply optimizes index layout, algorithmic logic, and computing process to fully unlock the chip potential.
	KBest	Kunpeng Blazing-fast embedding similarity search thruster (KBest) is an efficient, Huawei-developed graph search algorithm. It optimizes the performance and precision of the nearest neighbor search by using methods such as quantization and NUMA scheduling, which are used for multi-dimensional vector approximate nearest neighbor search.
	KVecTurbo	Kunpeng Vector Turbo (KVecTurbo) is a vector retrieval acceleration component developed by Kunpeng and can be used together with the openGauss vector database. It quantifies and compresses high-dimensional vectors to quickly obtain the near neighbors of a query. In addition, KVecTurbo uses the single instruction, multiple data (SIMD) technique to accelerate distance calculation for multidimensional vector nearest neighbor search.
	KRL	Kunpeng Retrieval Library (KRL) is an operator library optimized for the Kunpeng platform to accelerate vector retrieval. KRL can accelerate Faiss-supported algorithms such as HNSW, PQFS, IVFPQ, and IVFPQFS by replacing operators.
	KNewPfordelta	Kunpeng New PForDelta (KNewPfordelta), engineered for the recall pipeline, is a vectorized decompression algorithm that optimizes inverted index processing for superior search performance.
	hnswlib	The open-source hnswlib has been deeply optimized for the Kunpeng platform. It delivers efficient FP16 retrieval through vectorization, and leverages optimization policies such as prefetching and instruction rescheduling.
	Faiss	The open-source Faiss algorithm library has been deeply optimized using key technologies such as vectorization, dimension-interleaved lookup and accumulation, and vector filtering and compression. These enhancements significantly improve the similarity search and clustering efficiency across IVFFlat, IVFPQ, HNSW, PQFS, and IVFPQFS indexing algorithms. In addition, FP16 support has been added for the HNSW index type.
	RaBitQ	Based on the open-source RaBitQ code, optimizations including FP16 precision optimization, Lookup Table (LUT) acceleration, Spilling with Orthogonality-Amplified Residuals (SOAR) vector allocation, and ML-based adaptive nprobe have been introduced for the Kunpeng platform to improve the retrieval performance.
	EmbeddingLookup	The Embedding Lookup module within the open-source Monolith large-scale real-time recommendation system has been deeply adapted and optimized. By leveraging key techniques including compiler option tuning, spinlock optimization, memory alignment optimization, and Arm SIMD vectorization, this optimization reduces table lookup latency and boosts online inference performance on the Kunpeng Arm platform.
Ranking-focused AI inference operator library	KDNN	Kunpeng Deep Neural Network Library (KDNN) optimizes the performance of AI operators based on the microarchitecture features of the Kunpeng processor and software optimization methods. This operator library is integrated into open-source oneDNN as a plugin, and supports interfacing with operators such as TensorFlow Matmul and softmax.
	KDNN_EXT	KDNN_EXT, as the extension library of KDNN, optimizes operators such as softmax and random_choice and encapsulates them into a Python interface for users to call.
	KTFOP	Kunpeng TensorFlow Operator (KTFOP) is an efficient, Huawei-developed TensorFlow operator library. It uses SIMD instructions and multi-core scheduling to accelerate operator processing in CPUs and reduce the usage of CPU computing resources, thereby increasing the overall end-to-end throughput of online inference.
	ANNC	TensorFlow leverages the Accelerated Neural Network Compiler (ANNC) to perform graph-level optimizations, enhancing inference performance in recommendation systems. ANNC provides optimization technologies including computational graph optimization, and generation and integration of high-performance fused operators.