OmniRuntime Overview
The big data features of OmniRuntime are presented in the form of plugins to improve the performance of data loading, computing, and exchange from end to end.
Data volumes generated from Internet services have been growing much faster than CPUs' computing power. The big data open-source ecosystem is also developing on a fast track. However, diversified computing engines and open-source components make it difficult to improve data processing performance throughout the lifecycle. Different big data engines use their own unique tuning policies and technologies to improve performance and efficiency. Some tuning items may be applied across multiple engines, which may cause resource contention and conflicts, reducing overall computing performance.
The OmniRuntime series features are
- In the data loading phase, OmniData implements near-data computing to reduce network data traffic.
- In the data computing phase, OmniOperator replaces open source Java operators with high-performance native operators to improve operator execution efficiency.
- In the data exchange phase, OmniShuffle accelerates data interaction between nodes.
- For scenarios where repeated queries or subqueries exist, OmniMV identifies the optimal materialized view through AI algorithms, reducing the overhead of repeated subqueries and thus improving query efficiency.
- For offline SQL query tasks, OmniAdvisor uses AI algorithms to intelligently tune the parameters of Spark and Hive tasks running in online systems.
- For conditional query of HBase, OmniHBaseGSI employs an independent index table to store index data, and queries the index table to improve HBase query efficiency.
- In confidential computing scenarios, the OmniShield feature provides data source encryption and decryption capabilities for DataFrame and SparkSQL applications, and also end-to-end security protection for Spark applications based on the Arm confidential computing trusted execution environment (TEE) kit.
- In a Hadoop cluster with unbalanced load between nodes, OmniScheduler optimizes the open source Capacity Scheduler to schedule resources based on the weight calculation and sorting results of cluster nodes' physical resources. This optimized Yarn load scheduling algorithm enables balanced resource configuration and efficient resource utilization.
Table 1 lists the open source components and versions to which each subfeature of OmniRuntime has been adapted.
Subfeature |
Compatible Open Source Component and Version |
|---|---|
OmniData |
Spark 3.0.0, Spark 3.1.1, Hive 3.1.0, openLooKeng 1.4.1, openLooKeng 1.6.1 |
OmniOperator |
Spark 3.1.1, Spark 3.3.1, Spark 3.4.3, Spark 3.5.2, Hive 3.1.0, openLooKeng 1.6.1 |
OmniShuffle |
Spark 3.1.1, Spark 3.3.1, Hive 3.1.0 |
OmniMV |
Spark 3.1.1, Spark 3.4.3, Hive 3.1.0, ClickHouse 22.3.6.5 |
OmniAdvisor |
Spark 3.1.1, Spark 3.3.1, Hive 3.1.0, Tez 0.10.0 |
OmniHBaseGSI |
HBase 2.4.14 |
OmniShield |
Spark 3.3.1, Hive 3.1.0 |
OmniScheduler |
Spark 3.3.1, Spark 3.1.1, Hive 3.1.0, Hadoop 3.3.4 |