Combining with OmniOperator

OmniOperator improves the operator execution efficiency and OmniShuffle optimizes the data interaction process. They combine to improve the end-to-end engine query performance.

Prerequisites

Before using the combination features, install OmniOperator. For details, see Kunpeng BoostKit 24.0.RC5 Big Data OmniRuntime Feature Guide.

Procedure

To use the OmniShuffle and OmniOperator combination feature to execute Spark services, you need to start the Spark SQL CLI.

Obtain the OmniShuffle and OmniOperator combination software package. For Spark 3.3, the software package is BoostKit-omnishuffle-spark-3.3.1-1.6.0-aarch64.zip. Upload the software package to /home/ockadmin and decompress it to obtain the ock-omniop-shuffle-manager-24.0.0-for-spark-3.3.jar file. Move the decompressed JAR file to the ${OCK_HOME}/jars directory.

In the ock_spark.conf file, add the OmniOperator configuration and shuffle manager configuration. For example:

spark.shuffle.manager              org.apache.spark.shuffle.ock.OckColumnarShuffleManager
spark.shuffle.ock.mode rss  # (Optional) RSS

spark.sql.orc.columnarReaderBatchSize 10000
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 28g
spark.driverEnv.LD_PRELOAD /opt/operator1.5.0/lib/libjemalloc.so.2
spark.executorEnv.LD_PRELOAD /opt/operator1.5.0/lib/libjemalloc.so.2
spark.executorEnv.OMNI_CONNECTED_ENGINE Spark
spark.executorEnv.OMNI_HOME /opt/operator1.5.0
spark.driverEnv.OMNI_HOME /opt/operator1.5.0
spark.executorEnv.LD_LIBRARY_PATH /opt/operator1.5.0/lib/:/usr/local/lib/HMPP:$LD_LIBRARY_PATH
spark.driverEnv.LD_LIBRARY_PATH /opt/operator1.5.0/lib/:/usr/local/lib/HMPP:$LD_LIBRARY_PATH
spark.sql.extensions com.huawei.boostkit.spark.ColumnarPlugin
spark.sql.join.columnar.preferShuffledHashJoin true
spark.sql.orc.impl native

Add the SparkExtension plugin startup command to the path of the JAR file on which OmniOperator depends. For example:

spark-sql --deploy-mode client --driver-cores 8 \
                               --driver-memory 40G \
                               --num-executors 24 \
                               --executor-cores 12 \
                               --executor-memory 25g \
                               --master yarn \
                               --conf spark.sql.codegen.wholeStage=false \
                               --jars /home/ockadmin/opt/ock/jars/* \
                               --jars /opt/operator1.5.0/jars/* \
                               --properties-file /home/ock_spark.conf \
                               --database tpcds_bin_partitioned_orc_3

Check whether the combination feature is enabled.
1. Execute a TPC-DS SQL task. If the shuffle operator is replaced with OmniColumnarShuffleExchange in the execution flowchart on the Spark History UI, OmniOperator has taken effect.
  Figure 1 Execution flowchart
2. If "Shuffle initialize success" is displayed in the driver log, OmniShuffle has taken effect.
  Figure 2 Driver log

Replace /opt/operator1.5.0/, the example installation directory of OmniOperator, with the actual one.
For details about the configuration items added in 2, see Kunpeng BoostKit 24.0.RC5 Big Data OmniRuntime Feature Guide.
The spark.shuffle.ock.mode parameter indicates the shuffle mode, which is the same as the OmniShuffle deployment mode.

Parent topic: Using the Feature