Architecture

Most big data engines use Java or Scala operators, which cannot fully utilize the CPU capability. Besides, these operators do not apply to heterogeneous computing or cannot give full play to hardware computing performance. OmniOperator uses native code to make full use of hardware, especially in heterogeneous computing.

OmniOperator provides fixed interfaces for distributed tasks. You can submit an SQL task to a Spark cluster. The cluster management node distributes the task to multiple compute nodes as subtasks for execution.

OmniOperator is invoked by user code only in a single task and does not interact with other subtasks. Figure 1 shows its architecture.

Figure 1 Software architecture of OmniOperator

OmniOperator performs the following functions:

Implements the high-performance OmniOperator using native code. It fully exploits the computing capabilities of hardware, especially the heterogeneous computing power. Compared with Java and Scala operators, OmniOperator greatly improves the compute engine performance.
Provides an efficient data organization mode. It defines a column-oriented storage mode independent of languages and uses off-heap memory to implement OmniVec, which can read data with zero copy. There is no serialization overhead, so that users can process data in the memory more efficiently.

Parent topic: Feature Overview