我要评分
获取效率
正确性
完整性
易理解

OmniStateStore

Apache Flink is an open-source stream processing framework designed for real-time stream processing and analysis. It supports both unbounded and bounded data streams and offers a rich set of APIs to accommodate a wide range of stream processing scenarios.

State store is an important feature of Flink and is mainly implemented by the state backend. As the volume of state data grows, the performance of state storage comes under pressure. OmniStateStore acts as the Flink backend plugin to accelerate state storage and improve the overall Flink performance.

Architecture Design

OmniStateStore serves as a middleware layer between Flink and RocksDB. It incorporates the dynamic filter technique, Flink-aware state caching, and Merge read/write optimization. Figure 1 illustrates the overall architecture.

  • Dynamic filter: A state-prefix filter is used to eliminate redundant drive lookup operations during MapState range queries. For workloads that require only point reads and writes, the MemTable data structure is replaced with a HashLinkedList to improve the efficiency of point operations.
  • Flink-aware state caching: A ValueState cache is employed to preferentially aggregate state for the same key in memory, thereby reducing the frequency of accesses to RocksDB. In addition, a Join operator cache is used to minimize repeated range queries over state during dual-stream join operations.
  • Merge read/write optimization: The RocksDB Merge API is used to replace the state read-modify-write (RMW) operations in Flink SQL and the DataStream API.
Figure 1 Overall architecture of OmniStateStore

Typical Deployments

As a Flink plugin, OmniStateStore is deployed in the same way as Flink. Flink supports multiple deployment modes, including Yarn, standalone, and containerized deployments.

In a typical deployment scenario, OmniStateStore is deployed across three Docker containers, each allocated 8 cores and 32 GB of memory. One container runs the Job Manager, while each of the remaining two containers hosts four Task Managers. The Job Manager is allocated 8 GB of memory, while each Task Manager is allocated two task slots and 8 GB of memory.