Rate This Document
Findability
Accuracy
Completeness
Readability

Feature List

Type

Name

Description

Constraints

Recall algorithm

KScaNN

An inverted index-based vector retrieval algorithm that deeply optimizes index layout, algorithmic logic, and computing process to fully unlock the chip potential.

  • Processor:
  • OSs: openEuler 22.03 LTS SP3 and openEuler 24.03 LTS SP1
  • Compiler: GCC 12.3
  • Performance: improved by 40% compared with 9654

KBest

Optimizes the performance and precision of the nearest neighbor search by using methods such as quantization and NUMA scheduling, which are used for multi-dimensional vector approximate nearest neighbor search.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compilers: GCC 10.3 and GCC 12.3
  • Performance: improved by 40% compared with 9654

KVecTurbo

Quantifies and compresses high-dimensional vectors to quickly obtain the near neighbors of a query. In addition, KVecTurbo uses SIMD instructions to accelerate distance calculation for multidimensional vector nearest neighbor search.

  • Processor:
  • OSs: openEuler 22.03 LTS SP3 and openEuler 20.03 LTS SP4
  • Compiler: GCC 10.3
  • Performance: improved by 30% compared with the open-source component

KRL

Kunpeng Retrieval Library (KRL) is an operator library optimized for the Kunpeng platform to accelerate vector retrieval. KRL can accelerate Faiss-supported algorithms such as HNSW, PQFS, IVFPQ, and IVFPQFS by replacing operators.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: the same as that of 9654

KNewPfordelta

Kunpeng New PForDelta algorithm is a vectorized decompression algorithm that optimizes inverted index processing for superior search performance.

  • Processor:
  • OSs: openEuler 22.03 LTS SP3, openEuler 20.03 LTS SP4, and openEuler 24.03 LTS SP1
  • Compilers: GCC 10.3 and GCC 12.3
  • Performance: the same as that of 9654 per core

Faiss

The open-source Faiss algorithm library has been deeply optimized using key technologies such as vectorization, dimension-interleaved lookup and accumulation, and vector filtering and compression. In addition, FP16 interface support has been added for the hnsw algorithm. These enhancements significantly improve the similarity search and clustering efficiency across IVFFlat, IVFPQ, HNSW, PQFS, and IVFPQFS indexing algorithms.

  • Processor: Kunpeng 950
  • Supported OSs: openEuler 24.03 LTS SP3/Debian12
  • Supported Compilers: GCC 12/llvm1606
  • Performance: the same as that of 9654, with IVFPQ 1.15x

RaBitQ

Based on the open-source RaBitQ code, the library is extended with Arm64 (AArch64) support, introducing FP16 precision optimization, NEON SIMD vectorization, assembly-level Lookup Table (LUT) acceleration, Spilling with Orthogonality-Amplified Residuals (SOAR) vector allocation, and ML-based adaptive nprobe.

  • Processor: Kunpeng 950
  • Supported OSs: openEuler 24.03 LTS SP3/Debian12
  • Supported Compilers: GCC 12/llvm1606
  • Performance: improved by 15% compared with 9654

EmbeddingLookup

Based on the open-source Monolith large-scale real-time recommendation system, its core Embedding Lookup module has been deeply adapted and optimized.

  • Processor: Kunpeng 950
  • Supported OSs: openEuler 24.03 LTS SP3/Debian12
  • Supported Compilers: GCC 12/llvm1606
  • Performance: improved by 15% compared with 9654 for a single instance.

hnswlib

The open-source hnswlib has been deeply optimized for the Arm architecture. It delivers efficient FP16 support through vectorization, and leverages optimization policies such as prefetching and instruction rescheduling.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: improved by 20% compared with the open-source component

Ranking-focused AI library

KDNN

An acceleration operator library used for the AI framework.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 10.3
  • Supported operators: Convolution, Eltwise, Inner Product, Reduction, Layer Normalization, PReLU, Matmul, Softmax, Sum, Reorder, Resampling, Concat, and Shuffle
  • Supported modes: single-thread and multi-thread modes
  • Performance: a 60% improvement of average performance (single core, single NUMA, single processor, and entire system)

KDNN_EXT

Use the Cython framework provide Python interfaces, making it more suitable for user scenarios.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 10.3
  • Supported operators: random_choice and softmax
  • Performance: improved by more than 10% compared with the open-source library

KTFOP

A core operator library for TensorFlow.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 10.3
  • Supported operators: Select, Less, Greater, FloorMod, Matmul, and LookupTableFind
  • Performance: improved by more than 20% compared with the open-source library

TensorFlow Serving thread scheduling optimization

The TensorFlow Serving thread scheduling optimization feature improves the TensorFlow operator scheduling algorithm and adds other thread management optimizations, effectively improving the model inference throughput in high-concurrency scenarios.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: throughput improved by 20% compared with the open-source library

ANNC

TensorFlow leverages the Accelerated Neural Network Compiler (ANNC) to perform graph-level optimizations, enhancing inference performance in recommendation systems. ANNC provides optimization technologies including computational graph optimization, and generation and integration of high-performance fused operators.

  • Processor:
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: throughput improved by 20% compared with the ModelZoo open-source library

The test results of the preceding algorithm features and performance metrics are based on the OS and compiler versions listed in the preceding table. The performance in other OSs or compiler environments is not verified.