Introduction
In storage I/O-intensive scenarios such as Spark and HBase components in distributed storage and big data, the performance of accessing I/O storage devices has a significant impact on the overall service performance. Users are also concerned about the cost per gigabyte of storage devices. The contradiction between storage capacity and I/O performance will exist for a long period of time. It is a good practice to use small-capacity, high-speed storage media as cache drives. Cache drives improve the overall storage I/O performance. They store the predicted I/O data that may be accessed again so that the data can be directly obtained from the high-speed cache.
Figure 1 and Figure 2 illustrate the smart prefetch software architecture for distributed storage and big data respectively.
- I/O storage devices include hard disk drives (HDDs) and solid-state drives (SSDs).
- The performance here refers to the bandwidth, latency, and number of operations per unit time for accessing I/O storage devices.
- Small-capacity, high-speed storage media may be random access memory (RAM) drives or Non-Volatile Memory express (NVMe) SSDs.
The smart prefetch function uses high-speed cache drives and efficient prefetch algorithms to improve the storage I/O performance, thus improving the overall system performance in I/O-intensive scenarios.
The smart prefetch function consists of the following modules:
- Huawei smart prefetch driver in kernel mode: bcache
- Huawei smart prefetch engine framework in user mode: acache_client
- Huawei smart prefetch engine algorithm in user mode: hcache
- bcache configuration tool: bcache-tools

