Software Architecture

OmniData supports access to popular data types such as ORC, Parquet, and TXT. It allows Spark to push down the Filter, Aggregation, and Limit operators to the CPUs on storage nodes to implement near-data computing, reducing transmission of invalid data on the network and improving big data computing performance. This feature interconnects with HAF and the distributed storage system Ceph or HDFS. OmniData operator pushdown consists of four parts, as shown in Figure 1.

Figure 1 Software architecture of OmniData operator pushdown

OmniData Client is an open source component that provides plugins for different engines. After annotations are added to functions to be pushed down using the annotation and compilation plugins provided by HAF, HAF automatically pushes tasks to the OmniData Server of HAF Executor, facilitating task execution.
HAF Runtime is a library deployed on the compute node (host node) to provide task offload capabilities and push tasks to HAF Executor.
HAF Daemon is deployed on the storage node (offload node) to maintain HAF Executor by using control commands.
HAF Executor is an independent process generated by HAF Daemon, which is used to execute OmniData Server jobs.
OmniData Server handles tasks pushed down by HAF Runtime.

Parent topic: Overview