Examples

This section uses the sift-128-euclidean.hdf5 dataset with 80 threads as an example. Run the following command to obtain the dataset:

1	wget http://ann-benchmarks.com/sift-128-euclidean.hdf5 --no-check-certificate

Obtain the test program. Assume that the directory where the program runs is /path/to/kbest_test/demo. The full path structure is as follows:

├── ann_dataset                                           // Dataset to process
├── indices                                               // Built graph index, which is automatically created during run time (In the corresponding dataset configuration file, save_types is set to save_graph.)
      └── sift-128-euclidean_TSDG_R_32_L_300.ksn          // Built graph index, which is automatically generated during run time (In the corresponding dataset configuration file, save_types is set to save_graph.)
├── searcher_indices                                      // Built searcher, which is automatically created during run time (In the corresponding dataset configuration file, save_types is set to save_searcher.)
      └── sift-128-euclidean_TSDG_R_32_L_300.ksn          // Built searcher, which is automatically generated during run time (In the corresponding dataset configuration file, save_types is set to save_searcher.)
├── datasets                                                // Dataset
      └── sift-128-euclidean.hdf5
├── main.py                                               // File that contains the running functions
└── sift_99.json                                          // Dataset configuration file
└── run.sh                                                // Example script

Procedure:

Assume that the program running directory is /path/to/kbest_test/demo. Store the dataset to the datasets folder in the directory.

Install the dependencies.

pip install scikit-learn h5py psutil numpy==1.24.2
yum install numactl numactl-devel

Run main.py.
1
python main.py 80 -1 sift_99.json
The test command parameters are described as follows:
```
python main.py <threads> <batch_size> <json_name>
```
- threads indicates the number of running threads.
- batch_size indicates the number of queries to be executed at a time in batch query mode. If batch_size is set to -1, all queries in the dataset are executed at a time.
- json_name indicates the name of the configuration file corresponding to the test dataset.
The command output is as follows:

Parent topic: Python