Feature List

Type	Name	Description	Constraints
Recall algorithm	KScaNN	An inverted index-based vector retrieval algorithm that deeply optimizes index layout, algorithmic logic, and computing process to fully unlock the chip potential.	Processor: OSs: openEuler 22.03 LTS SP3 and openEuler 24.03 LTS SP1 Compiler: GCC 12.3 Performance: improved by 40% compared with 9654
	KBest	Optimizes the performance and precision of the nearest neighbor search by using methods such as quantization and NUMA scheduling, which are used for multi-dimensional vector approximate nearest neighbor search.	Processor: OS: openEuler 22.03 LTS SP3 Compilers: GCC 10.3 and GCC 12.3 Performance: improved by 40% compared with 9654
	KVecTurbo	Quantifies and compresses high-dimensional vectors to quickly obtain the near neighbors of a query. In addition, KVecTurbo uses SIMD instructions to accelerate distance calculation for multidimensional vector nearest neighbor search.	Processor: OSs: openEuler 22.03 LTS SP3 and openEuler 20.03 LTS SP4 Compiler: GCC 10.3 Performance: improved by 30% compared with the open-source component
	KRL	Kunpeng Retrieval Library (KRL) is an operator library optimized for the Kunpeng platform to accelerate vector retrieval. KRL can accelerate Faiss-supported algorithms such as HNSW, PQFS, IVFPQ, and IVFPQFS by replacing operators.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 12.3 Performance: the same as that of 9654
	KNewPfordelta	Kunpeng New PForDelta algorithm is a vectorized decompression algorithm that optimizes inverted index processing for superior search performance.	Processor: OSs: openEuler 22.03 LTS SP3, openEuler 20.03 LTS SP4, and openEuler 24.03 LTS SP1 Compilers: GCC 10.3 and GCC 12.3 Performance: the same as that of 9654 per core
	Faiss	The open-source Faiss algorithm library has been deeply optimized using key technologies such as vectorization, dimension-interleaved lookup and accumulation, and vector filtering and compression. In addition, FP16 interface support has been added for the hnsw algorithm. These enhancements significantly improve the similarity search and clustering efficiency across IVFFlat, IVFPQ, HNSW, PQFS, and IVFPQFS indexing algorithms.	Processor: Kunpeng 950 Supported OSs: openEuler 24.03 LTS SP3/Debian12 Supported Compilers: GCC 12/llvm1606 Performance: the same as that of 9654, with IVFPQ 1.15x
	RaBitQ	Based on the open-source RaBitQ code, the library is extended with Arm64 (AArch64) support, introducing FP16 precision optimization, NEON SIMD vectorization, assembly-level Lookup Table (LUT) acceleration, Spilling with Orthogonality-Amplified Residuals (SOAR) vector allocation, and ML-based adaptive nprobe.	Processor: Kunpeng 950 Supported OSs: openEuler 24.03 LTS SP3/Debian12 Supported Compilers: GCC 12/llvm1606 Performance: improved by 15% compared with 9654
	EmbeddingLookup	Based on the open-source Monolith large-scale real-time recommendation system, its core Embedding Lookup module has been deeply adapted and optimized.	Processor: Kunpeng 950 Supported OSs: openEuler 24.03 LTS SP3/Debian12 Supported Compilers: GCC 12/llvm1606 Performance: improved by 15% compared with 9654 for a single instance.
	hnswlib	The open-source hnswlib has been deeply optimized for the Arm architecture. It delivers efficient FP16 support through vectorization, and leverages optimization policies such as prefetching and instruction rescheduling.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 12.3 Performance: improved by 20% compared with the open-source component
Ranking-focused AI library	KDNN	An acceleration operator library used for the AI framework.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 10.3 Supported operators: Convolution, Eltwise, Inner Product, Reduction, Layer Normalization, PReLU, Matmul, Softmax, Sum, Reorder, Resampling, Concat, and Shuffle Supported modes: single-thread and multi-thread modes Performance: a 60% improvement of average performance (single core, single NUMA, single processor, and entire system)
	KDNN_EXT	Use the Cython framework provide Python interfaces, making it more suitable for user scenarios.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 10.3 Supported operators: random_choice and softmax Performance: improved by more than 10% compared with the open-source library
	KTFOP	A core operator library for TensorFlow.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 10.3 Supported operators: Select, Less, Greater, FloorMod, Matmul, and LookupTableFind Performance: improved by more than 20% compared with the open-source library
	TensorFlow Serving thread scheduling optimization	The TensorFlow Serving thread scheduling optimization feature improves the TensorFlow operator scheduling algorithm and adds other thread management optimizations, effectively improving the model inference throughput in high-concurrency scenarios.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 12.3 Performance: throughput improved by 20% compared with the open-source library
	ANNC	TensorFlow leverages the Accelerated Neural Network Compiler (ANNC) to perform graph-level optimizations, enhancing inference performance in recommendation systems. ANNC provides optimization technologies including computational graph optimization, and generation and integration of high-performance fused operators.	Processor: OS: openEuler 22.03 LTS SP3 Compiler: GCC 12.3 Performance: throughput improved by 20% compared with the ModelZoo open-source library

The test results of the preceding algorithm features and performance metrics are based on the OS and compiler versions listed in the preceding table. The performance in other OSs or compiler environments is not verified.