Adding the KBest Algorithm
ANN-Benchmarks contains many algorithms. You can also use Huawei-developed algorithm KBest to search for datasets. To add the KBest algorithm, perform the following steps:
- Add the implementation of the KBest algorithm.
- Open the module.py file.
1vim /data/ann-benchmarks-main/ann_benchmarks/algorithms/milvus/module.py
- Add the following content at the end of the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
class MilvusKBEST(Milvus): def __init__(self, metric, dim, index_param): super().__init__(metric, dim, index_param) self._index_R = index_param.get("R", None) self._index_L = index_param.get("L", None) self._index_A = index_param.get("A", None) self._index_btype = index_param.get("init_builder_type", None) self._index_consecutive = index_param.get("consecutive", None) self._index_level = index_param.get("level", None) self._build_index_type = index_param.get("build_index_type", None) self._graph_opt_iter = index_param.get("graph_opt_iter", None) self._reorder = index_param.get("reorder", None) def get_index_param(self): return { "index_type": "KBEST", "params": { "R": self._index_R, "L": self._index_L, "A": self._index_A, "init_builder_type": self._index_btype, "consecutive": self._index_consecutive, "level": self._index_level, "build_index_type": self._build_index_type, "graph_opt_iter": self._graph_opt_iter, "reorder": self._reorder }, "metric_type": self._metric_type } def set_query_arguments(self, ef, num_threads, adding_pref, patience): self.search_params = { "metric_type": self._metric_type, "params": { "efs": ef, "num_search_thread": num_threads, "adding_pref": adding_pref, "patience": patience } } self.name = f"MilvusKBEST metric:{self._metric}, index_R:{self._index_R}, index_L:{self._index_L}, index_A:{self._index_A}, index_btype:{self._index_btype}, index_consecutive:{self._index_consecutive}, index_level:{self._index_level}, build_type={self._build_index_type}, graph_iter={self._graph_opt_iter}, reorder={self._reorder}, search_ef:{ef}, search_thread={num_threads}, adding_pref={adding_pref}, patience={patience}"
- Open the module.py file.
- Add the KBest algorithm configuration.
- Open the config.yaml file.
1vim ann_benchmarks/algorithms/milvus/config.yaml - Add the following content at the end of the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
- base_args: ["@metric", "@dimension"] constructor: MilvusKBEST disabled: false docker_tag: ann-benchmarks-milvus module: ann_benchmarks.algorithms.milvus name: milvus-kbest run_groups: KBest: args: R: [50] L: [100] A: [60] init_builder_type: ["RNNDescent"] consecutive: [20] level: [2] build_index_type: ["SSG"] graph_opt_iter: [6] reorder: [true] query_args: [[400], [1], [52], [80]]
- Open the config.yaml file.
Table 1 KBest parameters describes the KBest parameters. The reference values are determined based the query result precision, memory consumption, and time consumption. You can set the parameters as required.
Parameter |
Description |
Value Type and Range |
Configuration Reference |
Configuration Principle |
|---|---|---|---|---|
R |
Number of neighboring nodes. |
Integer, [11,499] |
[50] |
This parameter affects the graph construction time and final index quality. The value 50 is recommended. If the value is too large, the construction time may be too long and the search performance may deteriorate. If the value is too small, the search precision may be affected. |
L |
Candidate node list during the graph construction. |
Integer, [11,1999] |
[100] |
This parameter affects the graph construction time and final index quality. The value 100 is recommended. If the value is too large, the construction time may be too long. |
A |
Angle threshold during the pruning of graph construction. |
Integer, [10,360] |
[60] |
The value 120 is used for the IP dataset, while 60 for the L2 dataset. |
init_builder_type |
Name of the built index algorithm. |
const std::string&,
|
"RNNDescent" |
Unless otherwise specified, RNNDescent is preferred. |
consecutive |
Block size. |
Integer, [1,31] |
[20] |
You may adjust the value as required. |
efs |
Size of the candidate node list during query. |
Integer, [1, Number of graph construction nodes] |
[400] |
For small-scale datasets, the value ranges from 10 to 500. A larger efs value leads to higher search precision but lower search performance. It is advised to set efs to a smaller value when the precision meets the requirement. |
num_search_thread |
Number of threads during query. |
Integer, [1, Number of CPU cores] |
[1] |
You may adjust the value as required. |
build_index_type |
Index type during graph construction to select a neighboring node |
const std::string&,
|
"SSG" |
Unless otherwise specified, SSG is preferred. |
graph_opt_iter |
Number of rounds for index self-iteration during graph construction. |
Integer, [0, 30] |
[6] |
This parameter affects the graph construction time and final index quality. If the value is too large, the construction time may be too long. |
reorder |
Whether to perform reordering after graph construction |
Boolean, true or false |
[true] |
This parameter affects the graph construction time and final index quality. You are advised to enable it. |
adding_pref |
Threshold for inserting a hyperparameter candidate set before retrieval. |
Integer, greater than 0 |
[52] |
This parameter is used to limit the retrieval path length and stop the retrieval in advance. You may adjust the value as required. |
patience |
Retrieval patience value. |
Integer, greater than 0 |
[80] |
This parameter is used to limit the retrieval path length and stop the retrieval in advance. You may adjust the value as required. |
level |
Quantization level. |
Integer, [0,3] |
[2] |
Level 1 indicates SQ8 quantization, and level 2 indicates SQ4 quantization. The value 1 is used for the IP dataset, while 2 for the L2 dataset. |