Initialize
API Definition
- Status ScannInterface::Initialize(ConstSpan<float> dataset, DatapointIndex n_points, const std::string& config, int training_threads);
- Status ScannInterface::Initialize(ConstSpan<float> dataset, DatapointIndex n_points, const std::string& config, int training_threads, GmmUtils::KMeansParams kmOpt);
- Status ScannInterface::Initialize(ConstSpan<float> dataset, DatapointIndex n_points, const std::string& config, int training_threads, GmmUtils::KMeansParams kmOpt, float filter_thr, int filter_type);
Function
- Index construction (consistent with the open source algorithm).
- Index construction. An overload of the Initialize function. It sets extra parameters for tunning K-means clustering in both IVF and PQ (KScaNN-specific API).
- Index construction. Another overload of the Initialize function. It sets extra parameters for K-means clustering and component filtering in both IVF and PQ (KScaNN-specific API).
Parameters
Parameter |
Data Type |
Description |
Value Range |
|---|---|---|---|
dataset |
ConstSpan<float> |
Base library vector. |
The value cannot be null. |
n_points |
DatapointIndex |
Number of vectors in the base library. |
The length must be the same as that of dataset. dataset indicates the base library vector. |
config |
const std::string& |
Configuration file required for creating the index, containing all configuration parameters. |
- |
training_threads |
int |
Number of threads during index construction. |
≥ 1. |
kmOpt |
GmmUtils::KMeansParams |
K-means tuning parameters. |
- |
filter_thr |
float |
Filter threshold. |
[0, 1]. The default value is 0. |
filter_type |
int |
Filter type. |
|
config_pbtxt |
const std::string& |
Configuration file required for loading the index. |
- |
scann_assets_pbtxt |
const std::string& |
Index file list. |
- |
struct KMeansTunableExtraParams {
int32_t iter;
int32_t sample;
int32_t init;
};
struct KMeansParams {
KMeansTunableExtraParams ivf;
KMeansTunableExtraParams pq;
};
Parameter |
Data Type |
Description |
Value Range |
|
|---|---|---|---|---|
iter |
int32_t |
Number of K-means algorithm iterations. |
[0, MAXINT]. The default value is 0. |
|
sample |
int32_t |
Sample size. |
- |
|
init |
int32_t |
Initialization type of the K-means cluster center. |
{0, 1, 2, 3}. The default value is 0.
|
|
config is generated by create_config.py based on the parameters described in Table 2.
Parameter |
Data Type |
Description |
Value Range |
|
|---|---|---|---|---|
n_leaves |
int |
Total subspace number in the IVF partition. |
≥ 1. |
|
nb |
int32_t |
Number of vectors in the base library. |
The length must be the same as that of dataset. dataset indicates the base library vector. |
|
metricType |
std::string |
Distance type of the vector. |
dot_product or squared_l2. |
|
dims_per_block |
int |
Number of dimensions combined by PQ. |
[1,dim], where dim indicates the dimension of the base library vector. |
|
avq_threshold |
float |
Asymmetric bucket parameter. This parameter takes effect only for the L2 (squared_l2) dataset. |
[0,1] |
|
dim |
int32_t |
Dimensions of vectors in the base library. |
The dimensions must be the same as those of dataset. dataset indicates the base library vector. |
|
topK |
int |
Number of returned results. |
≥ 1. |
|
soar_lambda |
float |
Controls orthogonality. This parameter takes effect only for the IP (dot product) dataset. |
> 0. Set the value to −1 t o disable the function. |
|
overretrieve_factor |
float |
Used together with soar_lambda to specify the over-retrieval factor. This parameter takes effect only for the IP (dot_product) dataset. |
[1, 2]. Set the value to −1 t o disable the function. |
|
python create_config.py + std::to_string(n_leaves) + " "
+ std::to_string(nb) + " "
+ metricType + " "
+ std::to_string(dims_per_block) + " "
+ std::to_string(avq_threshold) + " "
+ std::to_string(dim) + " "
+ std::to_string(topK) + " "
+ std::to_string(soar_lambda) + " "
+ std::to_string(overretrieve_factor)
Return Value
Data Type |
Description |
|---|---|
Status |
Execution status of the method. You can determine whether the method is successfully executed by calling status.ok(). |