OmniOperator

Popular big data compute engines use high-level programming languages such as Java and Scala and memory-intensive operators. However, Java code runs slower than native code due to its overhead at the JVM layer. Also, given its semantic limitation, Java does not support single instruction, multiple data (SIMD)-like instructions well, which prevents CPUs from delivering their full computing power. Besides, most of existing big data engines perform in-memory computing based on the row-oriented data format and cannot fully leverage the vectorized instructions of chips.

The OmniRuntime operator acceleration feature, called OmniOperator, uses native code (C/C++) to implement big data SQL operators. First, OmniOperator uses the column-oriented memory data format, OmniVec, to perform in-memory computing. The memory of data in the same column is continuous, which improves data access performance and makes full use of vectorization acceleration. Second, vectorized operators are implemented using C++ and vectorized execution. Operators are processed in batches instead of row by row to better utilize CPU capabilities. Vectorized operators have more opportunities to use SIMD instructions in CPUs and increase the CPU cache hit ratio for vectorized execution, improving query performance and CPU utilization. In addition, when processing core functions such as hash calculation and aggregate value calculation, operators explicitly invoke Kunpeng NEON instructions and Kunpeng libraries for acceleration. In this way, multiple data can be operated at the same time based on a single instruction, further improving the operator processing performance.

In addition, OmniOperator can be used together with OmniShuffle to further improve the performance of big data compute engines. For details about the performance result, see OmniShuffle.

Figure 1 OmniOperator acceleration principle

OmniOperator improves the computing performance of Spark by more than 30% on average, as measured by the 99 TPC-DS benchmark queries.

Figure 2 OmniOperator TPC-DS benchmark result

OmniOperator improves the computing performance of Hive by more than 20% on average, as measured by the 99 TPC-DS benchmark queries.

Figure 3 OmniOperator TPC-DS benchmark result

Parent topic: Solution Features