我要评分
获取效率
正确性
完整性
易理解

Feature List

Type

Name

Description

Constraints

Recall algorithm

KScaNN

A vector retrieval algorithm built on IVF. It uses the Kunpeng architecture to deeply optimize the index layout, algorithm process, and computing process, fully unleashing the chip potential.

  • Processor: Kunpeng 920 7282C
  • OSs: openEuler 22.03 LTS SP3 and openEuler 24.03 LTS SP1
  • Compiler: GCC 12.3
  • Performance: improved by 40% compared with 9654

KBest

Optimize the performance and precision of the nearest neighbor search by using methods such as quantization and NUMA scheduling, which are used for multi-dimensional vector approximate nearest neighbor search.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compilers: GCC 10.3 and GCC 12.3
  • Performance: improved by 40% compared with 9654

KVecTurbo

Quantify and compress high-dimensional vectors to quickly obtain the near neighbors of a query. In addition, KVecTurbo uses the SIMD instructions to accelerate distance calculation for multidimensional vector nearest neighbor search.

  • Processor: Kunpeng 920 7282C
  • OSs: openEuler 22.03 LTS SP3 and openEuler 20.03 LTS SP4
  • Compiler: GCC 10.3
  • Performance: improved by 30% compared with the open source component

KRL

Kunpeng Retrieval Library (KRL) is an operator library optimized for the Kunpeng platform to accelerate vector retrieval. KRL can accelerate Faiss-supported algorithms such as HNSW, PQFS, IVFPQ, and IVFPQFS by replacing operators.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: the same as that of 9654

KNewPfordelta

Kunpeng New PForDelta algorithm is an efficient IVF decompression algorithm. It accelerates the retrieval stage by leveraging vector instructions and other optimizations.

  • Processor: Kunpeng 920 7282C
  • OSs: openEuler 22.03 LTS SP3, openEuler 20.03 LTS SP4, and openEuler 24.03 LTS SP1
  • Compilers: GCC 10.3 and GCC 12.3
  • Performance: a 100% improvement of single-core performance

Faiss

The open source Faiss algorithm library has been deeply optimized using key technologies such as vectorization, dimension-interleaved lookup and accumulation, and vector filtering and compression. These enhancements significantly improve the similarity search and clustering efficiency across IVFFlat, IVFPQ, HNSW, PQFS, and IVFPQFS indexing algorithms.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3

hnswlib

The open source hnswlib algorithm library has been deeply optimized for the Arm architecture. It delivers FP16 support through vectorization, and leverages optimization policies such as prefetching and instruction rescheduling.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: improved by 20% compared with the open source component

Ranking-focused AI library

KDNN

An acceleration operator library used for the AI framework.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 10.3
  • Supported operators: Convolution, Eltwise, Inner Product, Reduction, Layer Normalization, PReLU, Matmul, Softmax, Sum, Reorder, Resampling, Concat, and Shuffle
  • Supported modes: single-thread and multi-thread modes
  • Performance: a 60% improvement of average performance (single core, single NUMA, single processor, and entire system)

KDNN_EXT

Use the Cython framework provide Python interfaces, making it more suitable for user scenarios.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 10.3
  • Supported operators: random_choice and softmax
  • Performance: improved by more than 10% compared with the open source library

KTFOP

A core operator library for TensorFlow.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 10.3
  • Supported operators: Select, Less, Greater, FloorMod, Matmul, and LookupTableFind
  • Performance: improved by more than 20% compared with the open source library

TensorFlow Serving thread scheduling optimization

The TensorFlow Serving thread scheduling optimization feature improves the TensorFlow operator scheduling algorithm and adds other thread management optimizations, effectively improving the model inference throughput in high-concurrency scenarios.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: throughput improved by 20% compared with the open source library

ANNC

TensorFlow leverages the Accelerated Neural Network Compiler (ANNC) to perform graph-level optimizations, enhancing inference performance in recommendation systems. ANNC provides optimization technologies including computational graph optimization, and generation and integration of high-performance fused operators.

  • Processor: Kunpeng 920 7282C
  • OS: openEuler 22.03 LTS SP3
  • Compiler: GCC 12.3
  • Performance: throughput improved by 20% compared with the ModelZoo open source library

The test results of the preceding algorithm features and performance metrics are based on the OS and compiler versions listed in the preceding table. The performance in other OSs or compiler environments is not verified.