Adding the KBest Algorithm

ANN-Benchmarks contains many algorithms. You can also use Huawei-developed algorithm KBest to search for datasets. To add the KBest algorithm, perform the following steps:

Add the implementation of the KBest algorithm.

Open the module.py file.

1	vim /data/ann-benchmarks-main/ann_benchmarks/algorithms/milvus/module.py

Add the following content at the end of the file:

class MilvusKBEST(Milvus):
    def __init__(self, metric, dim, index_param):
        super().__init__(metric, dim, index_param)
        self._index_R = index_param.get("R", None)
        self._index_L = index_param.get("L", None)
        self._index_A = index_param.get("A", None)
        self._index_btype = index_param.get("init_builder_type", None)
        self._index_consecutive = index_param.get("consecutive", None)
        self._index_level = index_param.get("level", None)
        self._build_index_type = index_param.get("build_index_type", None)
        self._graph_opt_iter = index_param.get("graph_opt_iter", None)
        self._reorder = index_param.get("reorder", None)

    def get_index_param(self):
        return {
            "index_type": "KBEST",
            "params": {
                "R": self._index_R,
                "L": self._index_L,
                "A": self._index_A,
                "init_builder_type": self._index_btype,
                "consecutive": self._index_consecutive,
                "level": self._index_level,
                "build_index_type": self._build_index_type,
                "graph_opt_iter": self._graph_opt_iter,
                "reorder": self._reorder
            },
            "metric_type": self._metric_type
        }
    def set_query_arguments(self, ef, num_threads, adding_pref, patience):
        self.search_params = {
            "metric_type": self._metric_type,
            "params": {
                "efs": ef,
                "num_search_thread": num_threads,
                "adding_pref": adding_pref,
                "patience": patience
            }
        }
        self.name = f"MilvusKBEST metric:{self._metric}, index_R:{self._index_R}, index_L:{self._index_L}, index_A:{self._index_A}, index_btype:{self._index_btype}, index_consecutive:{self._index_consecutive}, index_level:{self._index_level}, build_type={self._build_index_type}, graph_iter={self._graph_opt_iter}, reorder={self._reorder}, search_ef:{ef}, search_thread={num_threads}, adding_pref={adding_pref}, patience={patience}"

Add the KBest algorithm configuration.

Open the config.yaml file.

1	vim ann_benchmarks/algorithms/milvus/config.yaml

Add the following content at the end of the file:

- base_args: ["@metric", "@dimension"]
  constructor: MilvusKBEST
  disabled: false
  docker_tag: ann-benchmarks-milvus
  module: ann_benchmarks.algorithms.milvus
  name: milvus-kbest
  run_groups:
    KBest:
      args:
        R: [50]
        L: [100]
        A: [60]
        init_builder_type: ["RNNDescent"]
        consecutive: [20]
        level: [2]
        build_index_type: ["SSG"]
        graph_opt_iter: [6]
        reorder: [true]
      query_args: [[400], [1], [52], [80]]

Table 1 KBest parameters describes the KBest parameters. The reference values are determined based the query result precision, memory consumption, and time consumption. You can set the parameters as required.

**Table 1** KBest parameters
Parameter	Description	Value Type and Range	Configuration Reference	Configuration Principle
R	Number of neighboring nodes.	Integer, [11,499]	[50]	This parameter affects the graph construction time and final index quality. The value 50 is recommended. If the value is too large, the construction time may be too long and the search performance may deteriorate. If the value is too small, the search precision may be affected.
L	Candidate node list during the graph construction.	Integer, [11,1999]	[100]	This parameter affects the graph construction time and final index quality. The value 100 is recommended. If the value is too large, the construction time may be too long.
A	Angle threshold during the pruning of graph construction.	Integer, [10,360]	[60]	The value 120 is used for the IP dataset, while 60 for the L2 dataset.
init_builder_type	Name of the built index algorithm.	const std::string&, "RNNDescent" "NNDescent"	"RNNDescent"	Unless otherwise specified, RNNDescent is preferred.
consecutive	Block size.	Integer, [1,31]	[20]	You may adjust the value as required.
efs	Size of the candidate node list during query.	Integer, [1, Number of graph construction nodes]	[400]	For small-scale datasets, the value ranges from 10 to 500. A larger efs value leads to higher search precision but lower search performance. It is advised to set efs to a smaller value when the precision meets the requirement.
num_search_thread	Number of threads during query.	Integer, [1, Number of CPU cores]	[1]	You may adjust the value as required.
build_index_type	Index type during graph construction to select a neighboring node	const std::string&, "HNSW" "SSG" "NSG" "TSDG"	"SSG"	Unless otherwise specified, SSG is preferred.
graph_opt_iter	Number of rounds for index self-iteration during graph construction.	Integer, [0, 30]	[6]	This parameter affects the graph construction time and final index quality. If the value is too large, the construction time may be too long.
reorder	Whether to perform reordering after graph construction	Boolean, true or false	[true]	This parameter affects the graph construction time and final index quality. You are advised to enable it.
adding_pref	Threshold for inserting a hyperparameter candidate set before retrieval.	Integer, greater than 0	[52]	This parameter is used to limit the retrieval path length and stop the retrieval in advance. You may adjust the value as required.
patience	Retrieval patience value.	Integer, greater than 0	[80]	This parameter is used to limit the retrieval path length and stop the retrieval in advance. You may adjust the value as required.
level	Quantization level.	Integer, [0,3]	[2]	Level 1 indicates SQ8 quantization, and level 2 indicates SQ4 quantization. The value 1 is used for the IP dataset, while 2 for the L2 dataset.

Parent topic: Modifying Algorithm Parameters