Rate This Document
Findability
Accuracy
Completeness
Readability

Adding the Index Creation Check

During Milvus index creation, the program may prematurely proceed to the next phase while the index is still being built in the background. If index creation takes a long time, query operations may begin before completion, while the process continues consuming significant CPU resources in the background. This can lead to incomplete indexes, resource contention, and ultimately result in low recall and QPS.

To avoid this situation, judgments need to be added to the source code of the overall process to ensure that subsequent phases only begin after index creation is complete.

  1. Open the module.py file.

    Assume that the ann-benchmarks-main folder is stored in the /data/milvus directory.

    1
    vim /data/milvus/ann-benchmarks-main/ann_benchmarks/algorithms/milvus/module.py
    
  2. Introduce a timestamp file to track when each phase occurs.
    1
    from datetime import datetime
    
  3. Modify the create_index function.
    1. Delete the following source code.
      1
      2
      3
      4
      5
      6
      index = self.collection.index(index_name = "vector_index")
      index_progress =  utility.index_building_progress(
          collection_name = self.collection_name,
          index_name = "vector_index"
      )
      print(f"[Milvus] Create index {index.to_dict()} {index_progress} for collection {self.collection_name} successfully!!!")
      
    2. Add the sleep time at the end of the function to ensure sufficient data preparation time.
      1
      sleep(300)
      
  4. Modify the fit function.

    Insert a while loop after the self.create_index() statement to delay progression until the specified condition is satisfied. If the condition is not met, the loop pauses execution for a defined interval before re-evaluating.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    self.create_index()
    
    index = self.collection.index(index_name = "vector_index")
    while True:
        index_progress =  utility.index_building_progress(
            collection_name = self.collection_name,
            index_name = "vector_index"
        )
        if index_progress["total_rows"] == index_progress["indexed_rows"] and index_progress["pending_index_rows"] == 0:
            break
        sleep(300)
    print(f"[Milvus] Create index {index.to_dict()} {index_progress} for collection {self.collection_name} successfully!!!")
    print(f"{datetime.now()} [Milvus-sync] utility.index_building_progress successfully!!!")
    
    self.load_collection()