Examples
This section provides details on how to call the KBest algorithm API in Python. In the example, the sift-128-euclidean.hdf5 dataset is used and the program runs with 80 threads.
Obtaining the Dataset and Test Program
- Obtain a dataset.
1wget http://ann-benchmarks.com/sift-128-euclidean.hdf5 --no-check-certificate
- Obtain a test program.Obtain it from this link. The branch is v1.2.0. Assume that the program runs at the directory /path/to/kbest_test/demo. The full directory structure is as follows:
1 2 3 4 5 6 7 8 9 10
├── ann_dataset // Dataset to process ├── indices // Built graph index, which is automatically created during run time (In the corresponding dataset configuration file, save_types is set to save_graph.) └── sift-128-euclidean_TSDG_R_32_L_300.ksn // Built graph index, which is automatically generated during run time (In the corresponding dataset configuration file, save_types is set to save_graph.) ├── searcher_indices // Built searcher, which is automatically created during run time (In the corresponding dataset configuration file, save_types is set to save_searcher.) └── sift-128-euclidean_TSDG_R_32_L_300.ksn // Built searcher, which is automatically generated during run time (In the corresponding dataset configuration file, save_types is set to save_searcher.) ├── datasets // Stores the dataset. └── sift-128-euclidean.hdf5 ├── main.py // File that contains the running functions └── sift_99.json // Dataset configuration file └── run.sh // Example script
Procedure
- Assume that the program running directory is /path/to/kbest_test/demo. Store the dataset to the datasets folder in the directory.
- Install the dependencies.
1 2
pip install scikit-learn h5py psutil numpy==1.24.2 yum install numactl numactl-devel
- Run main.py.
1python main.py 80 -1 sift_99.json
The test command parameters are described as follows:
python main.py <threads> <batch_size> <json_name>
- threads indicates the number of running threads.
- batch_size indicates the number of queries to be executed at a time in batch query mode. If batch_size is set to -1, all queries in the dataset are executed at a time.
- json_name indicates the name of the configuration file corresponding to the test dataset.
The command output is as follows:

Parent topic: Python