Open-Source RocksDB-like Database Reference Architecture
Scenarios
RocksDB is a key-value (KV) data storage engine designed based on the log-structured merge-tree (LSM-Tree). It features high write throughput, strong read performance, and high compression ratio. It is applicable to write-intensive, high-concurrency, and local persistence scenarios. The LSM-Tree storage engine model enables a large number of sequential writes to drives, which is friendly to hard disk drives (HDDs).
Architectures and Principles
|
Module Name |
Description |
|---|---|
|
Write-ahead log (WAL) |
It records data before the data is written to memtables. This ensures that the successfully written data is not lost in the event of a breakdown. |
|
Manifest Log |
It stores database metadata, including the Sorted String Table (SSTable) files, the level (L0 to Ln) and key range of each file, the newly generated files, and the files to be deleted. It also supports multi-version and multi-column family management. |
|
Memtable |
It is a SkipList in the memory. All data is sequentially written to a memtable first, which is fast. When a memtable is full, it becomes an immutable memtable and will be flushed to drives in the background. |
|
Immutable Memtable |
When the data written to a memtable reaches a certain threshold, the memtable is converted into an immutable memtable and waits to be flushed to the L0 persistent layer. |
|
SSTable |
SSTables are sorted and immutable KV files on drives. They are sorted by key and support binary search. They feature a multi-layer structure, including L0, L1, L2, and so on. |
|
Compaction |
It combines multi-layer, duplicate, expired, and deleted data and deletes old versions to keep data in order and reduce read amplification. |
|
Block Cache |
It is the read cache in the memory, which caches data blocks in SSTable files on drives to reduce drive I/O, improve read performance, and reduce latency. |