Combining with OmniOperator
OmniOperator improves the operator execution efficiency and OmniShuffle optimizes the data interaction process. They combine to improve the end-to-end engine query performance.
Prerequisites
Before using the combination features, install OmniOperator. For details, see Kunpeng BoostKit 24.0.RC5 Big Data OmniRuntime Feature Guide.
Procedure
To use the OmniShuffle and OmniOperator combination feature to execute Spark services, you need to start the Spark SQL CLI.
- Obtain the OmniShuffle and OmniOperator combination software package. For Spark 3.3, the software package is BoostKit-omnishuffle-spark-3.3.1-1.6.0-aarch64.zip. Upload the software package to /home/ockadmin and decompress it to obtain the ock-omniop-shuffle-manager-24.0.0-for-spark-3.3.jar file. Move the decompressed JAR file to the ${OCK_HOME}/jars directory.
- In the ock_spark.conf file, add the OmniOperator configuration and shuffle manager configuration. For example:
spark.shuffle.manager org.apache.spark.shuffle.ock.OckColumnarShuffleManager spark.shuffle.ock.mode rss # (Optional) RSS spark.sql.orc.columnarReaderBatchSize 10000 spark.memory.offHeap.enabled true spark.memory.offHeap.size 28g spark.driverEnv.LD_PRELOAD /opt/operator1.5.0/lib/libjemalloc.so.2 spark.executorEnv.LD_PRELOAD /opt/operator1.5.0/lib/libjemalloc.so.2 spark.executorEnv.OMNI_CONNECTED_ENGINE Spark spark.executorEnv.OMNI_HOME /opt/operator1.5.0 spark.driverEnv.OMNI_HOME /opt/operator1.5.0 spark.executorEnv.LD_LIBRARY_PATH /opt/operator1.5.0/lib/:/usr/local/lib/HMPP:$LD_LIBRARY_PATH spark.driverEnv.LD_LIBRARY_PATH /opt/operator1.5.0/lib/:/usr/local/lib/HMPP:$LD_LIBRARY_PATH spark.sql.extensions com.huawei.boostkit.spark.ColumnarPlugin spark.sql.join.columnar.preferShuffledHashJoin true spark.sql.orc.impl native
- Add the SparkExtension plugin startup command to the path of the JAR file on which OmniOperator depends. For example:
spark-sql --deploy-mode client --driver-cores 8 \ --driver-memory 40G \ --num-executors 24 \ --executor-cores 12 \ --executor-memory 25g \ --master yarn \ --conf spark.sql.codegen.wholeStage=false \ --jars /home/ockadmin/opt/ock/jars/* \ --jars /opt/operator1.5.0/jars/* \ --properties-file /home/ock_spark.conf \ --database tpcds_bin_partitioned_orc_3 - Check whether the combination feature is enabled.
- Execute a TPC-DS SQL task. If the shuffle operator is replaced with OmniColumnarShuffleExchange in the execution flowchart on the Spark History UI, OmniOperator has taken effect.
Figure 1 Execution flowchart
- If "Shuffle initialize success" is displayed in the driver log, OmniShuffle has taken effect.
Figure 2 Driver log
- Execute a TPC-DS SQL task. If the shuffle operator is replaced with OmniColumnarShuffleExchange in the execution flowchart on the Spark History UI, OmniOperator has taken effect.
- Replace /opt/operator1.5.0/, the example installation directory of OmniOperator, with the actual one.
- For details about the configuration items added in 2, see Kunpeng BoostKit 24.0.RC5 Big Data OmniRuntime Feature Guide.
- The spark.shuffle.ock.mode parameter indicates the shuffle mode, which is the same as the OmniShuffle deployment mode.
Parent topic: Using the Feature