Rate This Document
Findability
Accuracy
Completeness
Readability

OmniRuntime Overview

The big data features of OmniRuntime are presented in the form of plugins to improve the performance of data loading, computing, and exchange from end to end.

Data volumes generated from Internet services have been growing much faster than CPUs' computing power. The big data open-source ecosystem is also developing on a fast track. However, diversified computing engines and open-source components make it difficult to improve data processing performance throughout the lifecycle. Different big data engines use their own unique tuning policies and technologies to improve performance and efficiency. Some tuning items may be applied across multiple engines, which may cause resource contention and conflicts, reducing overall computing performance.

OmniRuntime consists of a series of features provided by Kunpeng BoostKit for Big Data in terms of application acceleration. It aims to improve the performance of end-to-end data loading, computing, and exchange through plugins, thereby improving the performance of big data analytics.

The OmniRuntime series features are OmniData (operator pushdown), OmniOperator (operator acceleration), OmniShuffle (shuffle acceleration), OmniMV (materialized views), OmniAdvisor (parameter tuning), OmniHBaseGSI (global secondary indexes), and OmniShield (confidential big data). In the data loading phase, OmniData implements near-data computing to reduce network data traffic. In the data computing phase, OmniOperator replaces native Java operators with high-performance native operators to improve operator execution efficiency. In the data exchange phase, OmniShuffle accelerates data interaction between nodes. For scenarios where repeated queries or subqueries exist, OmniMV identifies the optimal materialized view through AI algorithms, reducing the overhead of repeated subqueries and thus improving query efficiency. In addition, OmniAdvisor uses AI algorithms to intelligently tune the parameters of Spark and Hive tasks running in online systems. For conditional query of HBase, OmniHBaseGSI employs an independent index table to store index data, and queries the index table to improve HBase query efficiency. In confidential computing scenarios, the OmniShield feature provides data source encryption and decryption capabilities for DataFrame and SparkSQL applications, and also end-to-end security protection for Spark applications based on the Arm confidential computing trusted execution environment (TEE) kit.

Table 1 lists the open source components and versions to which each subfeature of OmniRuntime has been adapted.

Table 1 Open source components and versions

Subfeature

Compatible Open Source Component and Version

OmniData

Spark 3.0.0, Spark 3.1.1, Hive 3.1.0, openLooKeng 1.4.1, openLooKeng 1.6.1

OmniOperator

Spark 3.1.1, Spark 3.3.1, Hive 3.1.0, openLooKeng 1.6.1

OmniShuffle

Spark 3.1.1, Spark 3.3.1, Hive 3.1.0

OmniMV

Spark 3.1.1, Hive 3.1.0, ClickHouse 22.3.6.5

OmniAdvisor

Spark 3.1.1, Spark 3.3.1, Hive 3.1.0, Tez 0.10.0

OmniHBaseGSI

HBase 2.4.14

OmniShield

Spark 3.3.1, Hive 3.1.0