OmniData

OmniData in the operator pushdown feature.

OmniData pushes operators of the big data engine to storage nodes to implement near-data computing, which reduces network bandwidth consumption and improves the query performance of the query engine. OmniData supports access to popular data types such as ORC and Parquet. It allows Spark to push down the Filter, Aggregation, and Limit operators to CPUs on a storage node to implement near-data computing, reducing transmission of invalid data on the network and improving big data computing performance.

This feature interconnects with HAF and the distributed storage system Ceph or HDFS. See Figure 1.

Figure 1 Software architecture of OmniData

The OmniData Client is an open source component that provides plugins for different engines. After annotations are added to functions to be pushed down using the annotation and compilation plugins provided by HAF, HAF automatically pushes tasks to the OmniData Server of the offload node, facilitating task execution.
The HAF Host Runtime is a library installed on the compute node to provide the task offload capability and push tasks to Target Runtime.
The HAF Target Runtime is a library installed on the storage node (offload node) to provide the task execution capability and execute OmniData Server jobs.
The OmniData Server handles tasks pushed down by the Host Runtime.

Parent topic: Key Features