Key Features

OmniData

OmniData pushes operators of the big data engine to storage nodes to implement near-data computing, which reduces network bandwidth consumption and improves the query performance of the query engine. OmniData supports access to popular data types such as ORC and Parquet. It allows Spark to push down the Filter, Aggregation, and Limit operators to CPUs on a storage node to implement near-data computing, reducing transmission of invalid data on the network and improving big data computing performance.

OmniOperator

OmniOperator uses native code (C/C++) to implement big data SQL operators to improve query performance. It uses columnar storage and vectorized execution technologies as well as Kunpeng vectorization instructions to improve operator execution efficiency and query performance of the query engine.

OmniShuffle

OmniShuffle runs in big data clusters of the customer's data center as a performance acceleration component of the big data engine Spark. It employs effective features such as unified addressing of the memory pool, data exchange in memory semantics, and converged shuffle to reduce the drive I/O overhead, quicken the data analysis process, and improve cluster resource utilization.

As a performance acceleration component of Spark, OmniShuffle uses the plugin mechanism provided by Spark to implement the Shuffle Manager and Broadcast Manager plugin interfaces and replace the open source Shuffle and Broadcast of Spark in a non-intrusive manner.

OmniMV

OmniMV uses AI algorithms to recommend the optimal materialized view from historical SQL queries, automatically matches SQL statements with a materialized view in Spark, and replaces part of the SQL statements in an execution plan with the matched materialized view. This feature reduces repeated calculations and increases query efficiency. You can submit an SQL task to a Spark cluster. The cluster management node distributes the task to multiple compute nodes as subtasks for execution.

OmniAdvisor

OmniAdvisor parses parameters of historical Spark and Hive SQL tasks, uses AI algorithms to intelligently tune parameter sampling, and implements end-to-end online parameter tuning for tasks.

OmniHBaseGSI

OmniHBaseGSI employs an independent index table to store index data, accelerating SingleColumnValueFilter conditional query. When a given query condition hits an index, the full-table query of the data table is converted to an exact-range query of the index table to increase the query speed.

OmniShield

OmniShield is a confidential computing component for the Spark big data engine. It runs in the TEE of the customer's data center to encrypt and decrypt data by executing the computing process in the hardware-based TEE. With OmniShield, data security in the REE is also safeguarded.

OmniScheduler

OmniScheduler enhances the capacity scheduling algorithm of Hadoop Yarn. It obtains the cluster load information and preferentially schedules low-load nodes based on the physical resource weight calculation and sorting results of node. Consequently, it improves load balancing within the cluster with balanced resource configuration and efficient resource utilization.

Parent topic: Feature Description