我要评分
获取效率
正确性
完整性
易理解

Adding the KBest Algorithm

ANN-Benchmarks contains many algorithms. You can also use Huawei-developed algorithm KBest to search for datasets. To add the KBest algorithm, perform the following steps:

  1. Add the implementation of the KBest algorithm.
    1. Open the module.py file.
      1
      vim /data/ann-benchmarks-main/ann_benchmarks/algorithms/milvus/module.py 
      
    2. Add the following content at the end of the file:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      class MilvusKBEST(Milvus):
          def __init__(self, metric, dim, index_param):
              super().__init__(metric, dim, index_param)
              self._index_R = index_param.get("R", None)
              self._index_L = index_param.get("L", None)
              self._index_A = index_param.get("A", None)
              self._index_btype = index_param.get("init_builder_type", None)
              self._index_consecutive = index_param.get("consecutive", None)
              self._index_level = index_param.get("level", None)
              self._build_index_type = index_param.get("build_index_type", None)
              self._graph_opt_iter = index_param.get("graph_opt_iter", None)
              self._reorder = index_param.get("reorder", None)
      
          def get_index_param(self):
              return {
                  "index_type": "KBEST",
                  "params": {
                      "R": self._index_R,
                      "L": self._index_L,
                      "A": self._index_A,
                      "init_builder_type": self._index_btype,
                      "consecutive": self._index_consecutive,
                      "level": self._index_level,
                      "build_index_type": self._build_index_type,
                      "graph_opt_iter": self._graph_opt_iter,
                      "reorder": self._reorder
                  },
                  "metric_type": self._metric_type
              }
          def set_query_arguments(self, ef, num_threads, adding_pref, patience):
              self.search_params = {
                  "metric_type": self._metric_type,
                  "params": {
                      "efs": ef,
                      "num_search_thread": num_threads,
                      "adding_pref": adding_pref,
                      "patience": patience
                  }
              }
              self.name = f"MilvusKBEST metric:{self._metric}, index_R:{self._index_R}, index_L:{self._index_L}, index_A:{self._index_A}, index_btype:{self._index_btype}, index_consecutive:{self._index_consecutive}, index_level:{self._index_level}, build_type={self._build_index_type}, graph_iter={self._graph_opt_iter}, reorder={self._reorder}, search_ef:{ef}, search_thread={num_threads}, adding_pref={adding_pref}, patience={patience}"
      
  2. Add the KBest algorithm configuration.
    1. Open the config.yaml file.
      1
      vim ann_benchmarks/algorithms/milvus/config.yaml
      
    2. Add the following content at the end of the file:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      - base_args: ["@metric", "@dimension"]
        constructor: MilvusKBEST
        disabled: false
        docker_tag: ann-benchmarks-milvus
        module: ann_benchmarks.algorithms.milvus
        name: milvus-kbest
        run_groups:
          KBest:
            args:
              R: [50]
              L: [100]
              A: [60]
              init_builder_type: ["RNNDescent"]
              consecutive: [20]
              level: [2]
              build_index_type: ["SSG"]
              graph_opt_iter: [6]
              reorder: [true]
            query_args: [[400], [1], [52], [80]]
      

Table 1 KBest parameters describes the KBest parameters. The reference values are determined based the query result precision, memory consumption, and time consumption. You can set the parameters as required.

Table 1 KBest parameters

Parameter

Description

Value Type and Range

Configuration Reference

Configuration Principle

R

Number of neighboring nodes.

Integer, [11,499]

[50]

This parameter affects the graph construction time and final index quality. The value 50 is recommended. If the value is too large, the construction time may be too long and the search performance may deteriorate. If the value is too small, the search precision may be affected.

L

Candidate node list during the graph construction.

Integer, [11,1999]

[100]

This parameter affects the graph construction time and final index quality. The value 100 is recommended. If the value is too large, the construction time may be too long.

A

Angle threshold during the pruning of graph construction.

Integer, [10,360]

[60]

The value 120 is used for the IP dataset, while 60 for the L2 dataset.

init_builder_type

Name of the built index algorithm.

const std::string&,

  • "RNNDescent"
  • "NNDescent"

"RNNDescent"

Unless otherwise specified, RNNDescent is preferred.

consecutive

Block size.

Integer, [1,31]

[20]

You may adjust the value as required.

efs

Size of the candidate node list during query.

Integer, [1, Number of graph construction nodes]

[400]

For small-scale datasets, the value ranges from 10 to 500. A larger efs value leads to higher search precision but lower search performance. It is advised to set efs to a smaller value when the precision meets the requirement.

num_search_thread

Number of threads during query.

Integer, [1, Number of CPU cores]

[1]

You may adjust the value as required.

build_index_type

Index type during graph construction to select a neighboring node

const std::string&,

  • "HNSW"
  • "SSG"
  • "NSG"
  • "TSDG"

"SSG"

Unless otherwise specified, SSG is preferred.

graph_opt_iter

Number of rounds for index self-iteration during graph construction.

Integer, [0, 30]

[6]

This parameter affects the graph construction time and final index quality. If the value is too large, the construction time may be too long.

reorder

Whether to perform reordering after graph construction

Boolean, true or false

[true]

This parameter affects the graph construction time and final index quality. You are advised to enable it.

adding_pref

Threshold for inserting a hyperparameter candidate set before retrieval.

Integer, greater than 0

[52]

This parameter is used to limit the retrieval path length and stop the retrieval in advance. You may adjust the value as required.

patience

Retrieval patience value.

Integer, greater than 0

[80]

This parameter is used to limit the retrieval path length and stop the retrieval in advance. You may adjust the value as required.

level

Quantization level.

Integer, [0,3]

[2]

Level 1 indicates SQ8 quantization, and level 2 indicates SQ4 quantization. The value 1 is used for the IP dataset, while 2 for the L2 dataset.