Rate This Document
Findability
Accuracy
Completeness
Readability

Cache Drive Fault

The BoostIO distributed cache layer uses NVMe SSDs as tier-2 cache media to persist data in the write and read caches; it needs to effectively handle cache drive faults.

Table 1 Cache drive fault scenarios

Scenario

Impact

Handling Method

Remarks

Adding a new drive

During drive adding, the front-end I/O performance decreases temporarily, and the service interruption duration does not exceed 60 seconds.

Adds and identifies the newly added drive, updates the configuration file, reports the drive addition event, triggers view rebalancing, evicts cache data, and initiates the cache.

  • Only one drive can be added at a time. A single node supports a maximum of four drives. If the number of drives exceeds this limit, an error is reported.
  • The capacity of the new drive must be the same as that of drives in the cluster.

Faulty drive removal

During fault detection and removal, the front-end I/O performance decreases, and the service interruption duration does not exceed 60 seconds.

Reports the drive fault to the cluster management module, completes data eviction from affected partitions, reports the completion, and triggers partition view recalculation and release (during which I/Os for the failed partitions are automatically retried).

  • Only one faulty drive can be tolerated. If two drives are faulty at the same time, data will be lost.