Feature Description
State store is an important feature of Flink and is mainly implemented by the state backend. As the volume of state data grows, the performance of state storage comes under pressure. OmniStateStore acts as the Flink backend plugin to accelerate state storage and improve the overall Flink performance.
Architecture
The OmniStateStore architecture comprises BSS-Cache and BSS-Store.
- BSS-Cache offers hot data access with hash-like performance and efficient data downgrading mechanisms.
- BSS-Store supports large-capacity access to warm data, based on a drive-oriented Log-Structured Merge-Tree (LSM-Tree).
Figure 1 shows the overall OmniStateStore architecture.
Constraints
As an acceleration plugin for Flink, OmniStateStore is compatible with the Huawei Kunpeng platform and also runs on general-purpose x86 servers.
Application Scenarios
OmniStateStore is designed for scenarios where I/O performance becomes a bottleneck as state data grows in Apache Flink stream processing tasks.
Typical scenarios include:
- Real-time big data processing tasks: In tasks such as real-time extract-transform-load (ETL), streaming aggregation, and windowed computation, the state grows continuously with incoming data.
- Complex event processing and stateful stream computing: Large-scale states (such as user session tracing and real-time risk control modeling) needs to be maintained for a long time.
- Flink jobs requiring high-throughput access: Experimental results show that OmniStateStore outperforms native Flink by 1.31 to 2.21 times.
Leveraging a two-level storage architecture (BSS-Cache for memory-level hot data access and BSS-Store for drive-level warm data capacity), OmniStateStore improves state read/write efficiency, runs on OSs such as openEuler 22.03 LTS SP3, and supports Flink 1.16.3 or later.
Typical Deployment
As a Flink plugin, OmniStateStore is deployed in the same way as Flink. Flink supports multiple deployment modes, such as Yarn, standalone, and containerized.
In a typical deployment scenario, OmniStateStore is deployed across three Docker containers, each allocated 8 cores and 32 GB of memory. One container runs the Job Manager, while each of the remaining two containers hosts four Task Managers. The Job Manager is allocated 8 GB of memory, while each Task Manager is allocated two task slots and 8 GB of memory.
