添加自研算法KBest

ann-benchmarks中自带了很多算法，也可以使用华为自研算法KBest对数据集进行搜索。以下为添加KBest算法的具体步骤。

添加KBest算法实现。

打开module.py文件。

1	vim /data/ann-benchmarks-main/ann_benchmarks/algorithms/milvus/module.py

按“i”进入编辑模式，在文件末尾添加下面代码。

class MilvusKBEST(Milvus):
    def __init__(self, metric, dim, index_param):
        super().__init__(metric, dim, index_param)
        self._index_R = index_param.get("R", None)
        self._index_L = index_param.get("L", None)
        self._index_A = index_param.get("A", None)
        self._index_btype = index_param.get("init_builder_type", None)
        self._index_consecutive = index_param.get("consecutive", None)
        self._index_level = index_param.get("level", None)
        self._build_index_type = index_param.get("build_index_type", None)
        self._graph_opt_iter = index_param.get("graph_opt_iter", None)
        self._reorder = index_param.get("reorder", None)

    def get_index_param(self):
        return {
            "index_type": "KBEST",
            "params": {
                "R": self._index_R,
                "L": self._index_L,
                "A": self._index_A,
                "init_builder_type": self._index_btype,
                "consecutive": self._index_consecutive,
                "level": self._index_level,
                "build_index_type": self._build_index_type,
                "graph_opt_iter": self._graph_opt_iter,
                "reorder": self._reorder
            },
            "metric_type": self._metric_type
        }
    def set_query_arguments(self, ef, num_threads, adding_pref, patience):
        self.search_params = {
            "metric_type": self._metric_type,
            "params": {
                "efs": ef,
                "num_search_thread": num_threads,
                "adding_pref": adding_pref,
                "patience": patience
            }
        }
        self.name = f"MilvusKBEST metric:{self._metric}, index_R:{self._index_R}, index_L:{self._index_L}, index_A:{self._index_A}, index_btype:{self._index_btype}, index_consecutive:{self._index_consecutive}, index_level:{self._index_level}, build_type={self._build_index_type}, graph_iter={self._graph_opt_iter}, reorder={self._reorder}, search_ef:{ef}, search_thread={num_threads}, adding_pref={adding_pref}, patience={patience}"

按“Esc”键，输入:wq!，按“Enter”保存并退出编辑。

添加KBest算法配置。

打开config.yml文件。

1	vim ann_benchmarks/algorithms/milvus/config.yml

按“i”进入编辑模式，在文件末尾添加下面代码。

- base_args: ["@metric", "@dimension"]
  constructor: MilvusKBEST
  disabled: false
  docker_tag: ann-benchmarks-milvus
  module: ann_benchmarks.algorithms.milvus
  name: milvus-kbest
  run_groups:
    KBest:
      args:
        R: [50]
        L: [100]
        A: [60]
        init_builder_type: ["RNNDescent"]
        consecutive: [20]
        level: [2]
        build_index_type: ["SSG"]
        graph_opt_iter: [6]
        reorder: [true]
      query_args: [[400], [1], [52], [80]]

按“Esc”键，输入:wq!，按“Enter”保存并退出编辑。

KBest参数说明见表1 KBest参数说明。配置参考是兼顾查询结果准确性、内存消耗、时间消耗的综合考虑，测试时可根据实际情况自行选取。

表1 **KBest参数**说明
参数名称	参数说明	类型	范围	配置参考	配置原则
R	邻居节点数	int	[11,499]	[50]	该参数影响图构建耗时和最终索引质量，一般推荐使用50，过大可能会导致构建耗时过长以及搜索性能下降，过小则会影响检索精度。
L	构图时的候选节点列表	int	[11,1999]	[100]	该参数影响图构建耗时和最终索引质量，一般推荐使用100，过大可能会导致构建耗时过长。
A	构图剪枝时的角度阈值	int	[10,360]	[60]	对于IP数据集，一般使用120，L2数据集一般使用60。
init_builder_type	构建的索引算法	const std::string&	"RNNDescent" "NNDescent"	"RNNDescent"	RNNDescent和NNDescent是两种用于近似最近邻搜索的算法。 RNNDescent适用于数据点分布相对均匀的场景，或者对计算资源有限的情况。 NNDescent适用于数据点分布不均匀的场景，或者需要更高搜索准确性的应用。无特殊情况，优先使用RNNDescent。
consecutive	块大小	int	[1,31]	[20]	根据实际情况自行调整。
efs	查询时的候选节点列表的大小	int	[1,构图节点数]	[400]	对于小规模数据集，一般在10~500左右。更大的efs会带来更高的检索精度，但是检索性能也会降低。建议在精度达标情况下efs取较小值。
num_search_thread	查询时的线程数	int	[1,CPU核数]	[1]	根据实际情况自行调整。
build_index_type	构图时的索引类型，选择邻居节点的策略	const std::string&	"HNSW" "SSG" "NSG" "TSDG"	"SSG"	HNSW、SSG、NSG和TSDG都是用于高效近似最近邻搜索的图结构算法，广泛应用于高维向量检索任务（如推荐系统、图像检索等）。 HNSW适用于高维、大规模数据集的向量检索任务。 SSG适用于中等规模的向量检索任务。 NSG适用于中等维度的向量检索任务。 TSDG适用于动态数据集、需要频繁更新的向量检索任务。无特殊情况，优先使用SSG。
graph_opt_iter	构图时图索引自我迭代的轮数	int	[0,30]	[6]	该参数影响图构建耗时和最终索引质量，过大可能会导致构建耗时过长。
reorder	构图之后是否重排	bool	true：开启重排 false：不开启重排	[true]	该参数影响图构建耗时和最终索引质量，建议开启。
adding_pref	检索前超参候选集插入阈值	int	大于0	[52]	该参数用来限制搜索路径长度，提前停止检索。数值根据实际情况进行调整。
patience	检索耐心值	int	大于0	[80]	该参数用来限制搜索路径长度，提前停止检索。数值根据实际情况进行调整。
level	控制量化的等级	int	[0,3]	[2]	level 1代表SQ8量化，level 2代表SQ4量化。对于IP数据集，一般使用1，L2数据集使用2。

父主题： 修改算法参数