Rate This Document
Findability
Accuracy
Completeness
Readability

OmniStream

Apache Flink is an open source real-time stream processing engine. As services rapidly evolve and data volumes surge, Flink's performance bottlenecks have started to surface in certain high-load scenarios, particularly evident in Internet-related use cases. The OmniStream Flink Native feature uses native code (C/C++) to implement Flink SQL operators, which improve query performance. It consists of the Java Adaptor layer implemented using Java and the CPP Core layer implemented using C++. The Java Adaptor layer is used to generate native execution plans and roll back the Java runtime in unsupported scenarios. The CPP Core layer implements operator logic, data transmission, and checkpoint-based fault tolerance mechanism. The Flink engine is reconstructed natively for enhanced performance.

Apache Flink offers the SQL interface and DataStream API to support various development scenarios and requirements. They are not completely separate but instead reflect different levels of abstraction. OmniStream implements two end-to-end acceleration frameworks that are tailored to the characteristics of the two interfaces.

Architecture Design

SQL

Figure 1 shows the architecture of OmniStream Flink SQL Native.

Figure 1 Architecture of OmniStream Flink SQL Native

After an SQL statement or the table API is inputted and parsed, an execution plan is generated. The Java Adaptor layer obtains the execution plan, initializes related tasks in the CPP, and generates an operator chain. After the initialization process is complete, Flink runs the task to read data from the source. After a series of operator processing operations, the result is outputted by Sink.

DataStream

Figure 2 shows the architecture of OmniStream Flink DataStream Native.

Figure 2 Architecture of OmniStream Flink DataStream Native

After the DataStream API is inputted and parsed, an execution plan is generated. The Java Adaptor layer obtains the execution plan, initializes related tasks at the CPP Core layer, and generates an operator chain. After the initialization process is complete, Flink runs the task to read data from the source. After a series of operator processing operations, the result is outputted by Sink.

OmniStream Performance Data

SQL

OmniStream improves the computing performance of Flink by more than 100% on average based on the Nexmark 22-query test cases. (By default, q0, q1, q2, q13, q14, q21 and q22 are supported, and other SQL statements are rolled back by default.)

Figure 3 Performance test result of Nexmark 22-query test cases

DataStream

The OmniStream Flink DataStream Native API has been verified using the two cases: In the WordCount scenario, OmniStream delivers 1.31 times the performance of open source Flink DataStream. In the stateless recomputing scenario, OmniStream achieves 1.6 times the performance of open source Flink DataStream.

Figure 4 Performance test result of OmniStream Flink DataStream Native