我要评分
获取效率
正确性
完整性
易理解

Installation and Usage Description

The KScaNN optimization feature for the Milvus database is provided as a patch file. Before using this feature, install SRA_Recall to ensure that the patch file can be compiled.

The open source Milvus source code does not include the Knowhere component. You need to pull the Knowhere source code during Milvus compilation and integrate it into the database. This feature is mainly used to optimize the index query, and a patch file is added to the Knowhere source code. Therefore, Milvus needs to be compiled twice: The first compilation is to obtain the Knowhere source code, and the second compilation is to enable the optimization feature after applying the patch file.

  1. Download SRA_Recall to the home directory ~, decompress the package, and install it.

    See Table 2 for details about how to obtain KScaNN.

    1
    2
    3
    cd ~
    unzip BoostKit-SRA_Recall-1.2.0.zip
    rpm -ivh boostkit-sra_recall-1.2.0-1.aarch64.rpm
    
  2. Download KSL to the home directory ~, decompress the package, and install it.

    See Table 2 for details about how to obtain KSL.

    1
    2
    3
    cd ~
    unzip BoostKit-ksl_2.4.0.zip
    rpm -ivh boostkit-ksl-2.4.0-1.aarch64.rpm
    
  3. Use Git to clone Milvus, select version 2.4.5, and place it in the home directory ~.

    See Table 2 for details about how to obtain Milvus, and follow instructions in Milvus Installation Guide to compile and install Milvus.

  4. Obtain the patch files of the optimization feature and upload them to the home directory ~.

    For details, see Table 2.

  5. Integrate the optimization feature. If no command output is displayed, the integration is successful. If the content of the ~/milvus/internal/core/conanfile.py file is modified during Milvus compilation, you can manually add the content after applying the patches.
    1
    2
    3
    4
    5
    6
    cd ~/milvus
    git status
    git restore .
    git apply --whitespace=nowarn < ~/0001-milvus-add-kbest-kscann.patch
    cd ~/milvus/cmake_build/thirdparty/knowhere/knowhere-src/
    git apply --whitespace=nowarn < ~/0001-knowhere-add-kbest-kscann.patch
    
  6. SRA_Recall provides only the dynamic library file of KScaNN. Therefore, you need to generate the dynamic library file libscann_cc.so of OpenScann. The procedure is as follows. For details, see Using SRA_Recall in the Kunpeng Recall Algorithm Library Developer Guide.
    1. Install the dependency packages.
      1
      2
      3
      4
      yum install python python3-pip python3-devel java-11-openjdk java-11-openjdk-devel rsync libomp hdf5 hdf5-devel gtest-devel libuuid-devel
      yum install gcc-toolset-12*
      export PATH=/opt/openEuler/gcc-toolset-12/root/usr/bin/:$PATH
      export LD_LIBRARY_PATH=/opt/openEuler/gcc-toolset-12/root/usr/lib64/:$LD_LIBRARY_PATH
      

      If openEuler 22.03 LTS SP4 is used, an lto-wrapper error will be reported if GCC 12 downloaded using Yum is used to compile libscann_cc.so. You can change the Yum repository proxy in the /etc/yum.repos.d/openEuler.repo file to the Yum repository proxy of openEuler 22.03 LTS SP3 to avoid this issue.

    2. Install the dependency software Bazel 5.3.0.
      1
      2
      3
      4
      5
      6
      cd ~
      wget https://github.com/bazelbuild/bazel/releases/download/5.3.0/bazel-5.3.0-dist.zip --no-check-certificate
      unzip bazel-5.3.0-dist.zip -d bazel-5.3.0
      cd bazel-5.3.0
      env EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" bash ./compile.sh
      export PATH=~/bazel-5.3.0/output:$PATH
      
    3. Download and compile OpenScann.
      1
      2
      3
      4
      cd ~
      git clone https://gitee.com/openeuler/sra_scann_adapter/tree/v1.1.0/
      unzip sra_scann_adapter-v1.1.0.zip -d OpenScann
      cd OpenScann
      

      Activate the Python virtual environment and compile libscann_cc.so.

      • Kunpeng 920 processor
        1
        2
        conda activate milvus
        sh project.sh -ah
        
      • New Kunpeng 920 processor model
        1
        2
        conda activate milvus
        sh project.sh -ag
        
    4. Specify the header file path and dynamic library file path of OpenScann.
      1
      2
      export OPEN_SCANN_LIB=~/OpenScann/kscann/scann/libscann_cc.so
      export OPEN_SCANN_INCLUDE=~/OpenScann/kscann/scann/
      

      The patch package reads the path specified by OPEN_SCANN_INCLUDE and runs a Python file in the directory. Therefore, the last slash (/) in the path cannot be deleted.

    5. Switch GCC 12 to GCC 10.3.1 and continue the compilation.

      GCC 12 is used to compile libscann_cc.so in 6 only. In subsequent Milvus compilation, use GCC 10.3.1.

  7. Install environment dependencies of Milvus-KScaNN.
    1. Install the dependency software Eigen 3.3.7.
      1
      2
      3
      4
      5
      6
      git clone https://gitlab.com/libeigen/eigen.git
      cd eigen
      git checkout 33d0937c6bdf5ec999939fb17f2a553183d14a74
      mkdir build && cd build
      cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/eigen-3.3.7
      make -sj && make install
      
    2. Activate the Python virtual environment and install Python dependencies.
      1
      2
      3
      conda activate milvus
      pip install treelite==4.2.1
      pip install tl2cgen
      
  8. Go back to the installation directory and perform full compilation on Milvus again to enable the optimization feature.
    1
    2
    cd ~/milvus
    make milvus
    
  9. Use the ANN-Benchmarks GIST dataset for tests and obtain the performance improvement after the optimization feature is enabled. For details about the test procedure, see Milvus ANN-Benchmarks Test Guide.