我要评分
获取效率
正确性
完整性
易理解

Combining with OmniOperator

OmniOperator improves the operator execution efficiency and OmniShuffle optimizes the data interaction process. They combine to improve the end-to-end engine query performance.

Prerequisites

Before using the combination features, install OmniOperator. For details, see Kunpeng BoostKit 24.0.RC5 Big Data OmniRuntime Feature Guide.

Procedure

To use the OmniShuffle and OmniOperator combination feature to execute Spark services, you need to start the Spark SQL CLI.

  1. Obtain the OmniShuffle and OmniOperator combination software package. For Spark 3.3, the software package is BoostKit-omnishuffle-spark-3.3.1-1.6.0-aarch64.zip. Upload the software package to /home/ockadmin and decompress it to obtain the ock-omniop-shuffle-manager-24.0.0-for-spark-3.3.jar file. Move the decompressed JAR file to the ${OCK_HOME}/jars directory.
  2. In the ock_spark.conf file, add the OmniOperator configuration and shuffle manager configuration. For example:
    spark.shuffle.manager              org.apache.spark.shuffle.ock.OckColumnarShuffleManager
    spark.shuffle.ock.mode rss  # (Optional) RSS
    
    spark.sql.orc.columnarReaderBatchSize 10000
    spark.memory.offHeap.enabled true
    spark.memory.offHeap.size 28g
    spark.driverEnv.LD_PRELOAD /opt/operator1.5.0/lib/libjemalloc.so.2
    spark.executorEnv.LD_PRELOAD /opt/operator1.5.0/lib/libjemalloc.so.2
    spark.executorEnv.OMNI_CONNECTED_ENGINE Spark
    spark.executorEnv.OMNI_HOME /opt/operator1.5.0
    spark.driverEnv.OMNI_HOME /opt/operator1.5.0
    spark.executorEnv.LD_LIBRARY_PATH /opt/operator1.5.0/lib/:/usr/local/lib/HMPP:$LD_LIBRARY_PATH
    spark.driverEnv.LD_LIBRARY_PATH /opt/operator1.5.0/lib/:/usr/local/lib/HMPP:$LD_LIBRARY_PATH
    spark.sql.extensions com.huawei.boostkit.spark.ColumnarPlugin
    spark.sql.join.columnar.preferShuffledHashJoin true
    spark.sql.orc.impl native
  3. Add the SparkExtension plugin startup command to the path of the JAR file on which OmniOperator depends. For example:
    spark-sql --deploy-mode client --driver-cores 8 \
                                   --driver-memory 40G \
                                   --num-executors 24 \
                                   --executor-cores 12 \
                                   --executor-memory 25g \
                                   --master yarn \
                                   --conf spark.sql.codegen.wholeStage=false \
                                   --jars /home/ockadmin/opt/ock/jars/* \
                                   --jars /opt/operator1.5.0/jars/* \
                                   --properties-file /home/ock_spark.conf \
                                   --database tpcds_bin_partitioned_orc_3
  4. Check whether the combination feature is enabled.
    1. Execute a TPC-DS SQL task. If the shuffle operator is replaced with OmniColumnarShuffleExchange in the execution flowchart on the Spark History UI, OmniOperator has taken effect.
      Figure 1 Execution flowchart
    2. If "Shuffle initialize success" is displayed in the driver log, OmniShuffle has taken effect.
      Figure 2 Driver log
  • Replace /opt/operator1.5.0/, the example installation directory of OmniOperator, with the actual one.
  • For details about the configuration items added in 2, see Kunpeng BoostKit 24.0.RC5 Big Data OmniRuntime Feature Guide.
  • The spark.shuffle.ock.mode parameter indicates the shuffle mode, which is the same as the OmniShuffle deployment mode.