Architecture
OmniOperator provides unified interfaces for distributed tasks. You can submit an SQL task to a Spark cluster. The cluster management node schedules the task, that is, distributes subtasks to multiple compute nodes for execution.
Most big data engines use Java or Scala operators, which rarely achieve full CPU utilization. In addition, their support for heterogeneous computing resources is limited, preventing hardware performance from being fully leveraged. OmniOperator uses native code to make full use of hardware, especially in heterogeneous computing.
OmniOperator performs the following functions:
- Implements high-performance Omni operators using native code. It fully exploits the performance potential of hardware, particularly in heterogeneous computing environments. Compared with Java and Scala operators, Omni operators enhance the execution efficiency of compute engines.
- Provides an efficient data organization mode. It defines a column-oriented storage mode independent of languages and uses off-heap memory to implement OmniVec, which can read data with zero copy. There is no serialization overhead, allowing users to process data more efficiently.
OmniOperator is invoked by user code only in a single task and does not interact with other subtasks. Figure 1 shows the software architecture of OmniOperator.
