Application Scenarios

Learn about the OmniRuntime application scenarios before using each of the features.

OmniData

OmniData is best suited for big data compute-storage decoupled or large-scale convergence scenarios where a large number of compute nodes read data from remote nodes. Such application scenarios swallow network bandwidth because a large amount of raw data is transmitted from storage nodes to compute nodes over the network while the proportion of valid data is generally low.

OmniOperator

OmniOperator applies to the data analysis engine, which converts user-input SQL statements into operators. OmniOperator provides native operators to replace the analysis engine operators, accelerating analysis engine execution and improving analysis performance. OmniOperator is well suited for large-scale convergence scenarios.

OmniMV

OmniMV is designed for scenarios where SQL analysis tasks in a data warehouse have many identical subqueries. Repeated calculations on these subqueries waste a large number of computing resources and decrease the query efficiency. This feature uses AI algorithms to recommend the optimal materialized view from historical SQL queries, automatically matches SQL statements with a materialized view, and replaces the SQL statements with the matched materialized view in an execution plan. This feature greatly reduces repeated calculations and increases the query efficiency.

OmniShuffle

After OmniOperator is used, shuffle data is still written to drives. When shuffle-intensive jobs are performed, a large amount of data still needs to be exchanged across nodes after the Map process is complete. Combining OmniShuffle and OmniOperator brings more performance benefits, especially for shuffle-intensive jobs.

In big data scenarios, the big data engine Spark is used to perform shuffle-intensive jobs. After the map process is complete, a large amount of data needs to be exchanged across nodes. Statistics show that the Spark shuffle process occupies the most time and resource overhead in many analysis scenarios and even 50% to 80% of the end-to-end time overhead of Spark services in some scenarios.

As a performance acceleration component of Spark, OmniShuffle uses the plugin mechanism provided by Spark to implement the Shuffle Manager and Broadcast Manager plugin interfaces and replace the native Shuffle and Broadcast of Spark in a non-intrusive manner. OmniShuffle enables in-memory shuffle by implementing the Shuffle Manager plugin interface. That is, the shuffle process is completed in the memory pool based on memory semantics, reducing shuffle data flushing to disks. The time overhead and computing power overhead caused by data flushing and reading, serialization and deserialization, compression and decompression can be lessened. In addition, the Broadcast Manager interface is implemented to enable variable broadcast based on memory pool sharing, improving the transmission efficiency of broadcast variables among executors. In addition, OmniShuffle supports two network modes: Remote Direct Memory Access (RDMA) and TCP. Compared with TCP, RDMA improves transmission efficiency, requires less computing power, and implements efficient data exchange between nodes.

In addition, OCK BoostTuning for Spark SQL automatically adjusts the parallelism degree of Spark SQL jobs in real time based on historical data, eliminating the need to manually optimize the parallelism degree and reducing spills in the shuffle-reduce process by 90%. Due to this, OCK BoostTuning quickens big data cluster jobs while increasing the job throughput.

Spark has a plugin mechanism. You can replace the original functions of Spark by implementing the Spark plugin interface.

OmniAdvisor

In an offline SQL query task, OmniAdvisor parses parameters of historical Spark and Hive SQL tasks, uses AI algorithms to intelligently tune parameter sampling, and implements end-to-end online parameter tuning for tasks.

OmniHBaseGSI

In non-primary key matching scenarios, OmniHBaseGSI simplifies application development, maintains data consistency, and increase the query speed. In those scenarios, HBase natively performs a full table scan, which is inefficient, especially when the data table is large.

OmniData Combined with OmniOperator

Combine OmniData and OmniOperator to bring more performance benefits.

OmniShuffle Combined with OmniOperator

Parent topic: Feature Description