Feature Description

The big data features of OmniRuntime are presented in the form of plugins to improve the performance of data loading, computing, and exchange from end to end.

Data volumes generated from Internet services have been growing much faster than CPUs' computing power. The open-source big data ecosystem is also developing on a fast track. However, diversified computing engines and open source components make it difficult to improve data processing performance throughout the lifecycle. Different big data engines use their own unique tuning policies and technologies to improve performance and efficiency. Some tuning items may be applied across multiple engines, which may cause resource contention and conflicts, reducing overall computing performance.

OmniRuntime consists of a series of features provided by Kunpeng BoostKit for Big Data in terms of application acceleration. It aims to improve the performance of end-to-end data loading, computing, and exchange through plugins, thereby improving the performance of big data analytics.

OmniData is part of the OmniRuntime feature set. It pushes operators of the big data engine to storage nodes to implement near-data computing, which reduces network bandwidth consumption and improves the query performance of the query engine. OmniData supports access to popular data types such as ORC and Parquet. It allows Spark to push down the Filter, Aggregation, and Limit operators to CPUs on a storage node to implement near-data computing, reducing transmission of invalid data on the network and improving big data computing performance.

It has been adapted to the following open source components and versions:

Spark 3.0.0
Spark 3.1.1
Hive 3.1.0
openLooKeng 1.4.1
openLooKeng 1.6.1

Parent topic: Feature Guide