Solution Architecture
Figure 1 shows the positioning of Kunpeng BoostKit for SRA components. The acceleration solution and modification policy of each component in Kunpeng BoostKit for SRA are different. Generally, the system architecture of the original base software is not modified. However, the modification policy varies according to the actual requirements. Table 1 describes each component.
Algorithm Type |
Component |
Description |
|---|---|---|
Recall algorithm |
KScaNN |
Kunpeng Scalable Nearest Neighbors (KScaNN) is an inverted index-based vector retrieval algorithm that deeply optimizes index layout, algorithmic logic, and computing process to fully unlock the chip potential. |
KBest |
Kunpeng Blazing-fast embedding similarity search thruster (KBest) is an efficient, Huawei-developed graph search algorithm. It optimizes the performance and precision of the nearest neighbor search by using methods such as quantization and NUMA scheduling, which are used for multi-dimensional vector approximate nearest neighbor search. |
|
KVecTurbo |
Kunpeng Vector Turbo (KVecTurbo) is a vector retrieval acceleration component developed by Kunpeng and can be used together with the openGauss vector database. It quantifies and compresses high-dimensional vectors to quickly obtain the near neighbors of a query. In addition, KVecTurbo uses the single instruction, multiple data (SIMD) technique to accelerate distance calculation for multidimensional vector nearest neighbor search. |
|
KRL |
Kunpeng Retrieval Library (KRL) is an operator library optimized for the Kunpeng platform to accelerate vector retrieval. KRL can accelerate Faiss-supported algorithms such as HNSW, PQFS, IVFPQ, and IVFPQFS by replacing operators. |
|
KNewPfordelta |
Kunpeng New PForDelta (KNewPfordelta), engineered for the recall pipeline, is a vectorized decompression algorithm that optimizes inverted index processing for superior search performance. |
|
hnswlib |
The open-source hnswlib has been deeply optimized for the Kunpeng platform. It delivers efficient FP16 retrieval through vectorization, and leverages optimization policies such as prefetching and instruction rescheduling. |
|
Faiss |
The open-source Faiss algorithm library has been deeply optimized using key technologies such as vectorization, dimension-interleaved lookup and accumulation, and vector filtering and compression. These enhancements significantly improve the similarity search and clustering efficiency across IVFFlat, IVFPQ, HNSW, PQFS, and IVFPQFS indexing algorithms. In addition, FP16 support has been added for the HNSW index type. |
|
RaBitQ |
Based on the open-source RaBitQ code, optimizations including FP16 precision optimization, Lookup Table (LUT) acceleration, Spilling with Orthogonality-Amplified Residuals (SOAR) vector allocation, and ML-based adaptive nprobe have been introduced for the Kunpeng platform to improve the retrieval performance. |
|
EmbeddingLookup |
The Embedding Lookup module within the open-source Monolith large-scale real-time recommendation system has been deeply adapted and optimized. By leveraging key techniques including compiler option tuning, spinlock optimization, memory alignment optimization, and Arm SIMD vectorization, this optimization reduces table lookup latency and boosts online inference performance on the Kunpeng Arm platform. |
|
Ranking-focused AI inference operator library |
KDNN |
Kunpeng Deep Neural Network Library (KDNN) optimizes the performance of AI operators based on the microarchitecture features of the Kunpeng processor and software optimization methods. This operator library is integrated into open-source oneDNN as a plugin, and supports interfacing with operators such as TensorFlow Matmul and softmax. |
KDNN_EXT |
KDNN_EXT, as the extension library of KDNN, optimizes operators such as softmax and random_choice and encapsulates them into a Python interface for users to call. |
|
KTFOP |
Kunpeng TensorFlow Operator (KTFOP) is an efficient, Huawei-developed TensorFlow operator library. It uses SIMD instructions and multi-core scheduling to accelerate operator processing in CPUs and reduce the usage of CPU computing resources, thereby increasing the overall end-to-end throughput of online inference. |
|
ANNC |
TensorFlow leverages the Accelerated Neural Network Compiler (ANNC) to perform graph-level optimizations, enhancing inference performance in recommendation systems. ANNC provides optimization technologies including computational graph optimization, and generation and integration of high-performance fused operators. |
