Rate This Document
Findability
Accuracy
Completeness
Readability

OmniShuffle

To use the OmniShuffle feature to execute Spark services, you need to start the Spark SQL CLI.

  1. Create an ock_spark.conf file in the /home directory. For details about the parameters in the file, see spark.conf.
    1. Create a file.
      vi /home/ock_spark.conf
    2. Press i to enter the insert mode and add the following content to the file:
      spark.master yarn
      spark.task.cpus 1
      spark.shuffle.compress true
      spark.shuffle.spill.compress true
      spark.rdd.compress true
      spark.executor.extraClassPath     /home/ockadmin/opt/ock/jars/*
      spark.driver.extraClassPath       /home/ockadmin/opt/ock/jars/*
      spark.driver.extraJavaOptions -Djava.library.path=/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/jars
      spark.executor.extraJavaOptions -Djava.library.path=/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/jars
      spark.driver.extraLibraryPath   /home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/jars:.
      spark.executor.extraLibraryPath /home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/23.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/jars:.
      spark.shuffle.manager              org.apache.spark.shuffle.ock.OCKShuffleManager
      spark.shuffle.ock.manager true
      spark.blacklist.enabled true
      spark.files.fetchFailure.unRegisterOutputOnHost true
      spark.shuffle.service.enabled  false
      spark.blacklist.application.fetchFailure.enabled true
      spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
      spark.driver.maxResultSize 2g
      spark.serializer                        org.apache.spark.serializer.KryoSerializer
      spark.shuffle.ock.home /home/ockadmin/opt/ock
      spark.shuffle.ock.version 23.0.0
      spark.shuffle.ock.binaryType linux-aarch64
      spark.sql.broadcastTimeout 3000
      spark.sql.extensions           org.apache.spark.sql.execution.adaptive.ock.BoostTuningExtension
      spark.sql.ock.autoConfig.enabled true
      spark.sql.ock.autoConfig.history true
      spark.sql.ock.autoConfig.globalRuntimePartition false
      spark.sql.ock.autoConfig.sample false
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  2. Start the Spark SQL CLI.

    The following is an example of the native Spark SQL startup command. You can adjust the values of the configuration items based on your requirements.

    /usr/local/spark/bin/spark-sql --deploy-mode client --driver-cores 8 --driver-memory 40g --num-executors 30 --executor-cores 6 --executor-memory 35g --master yarn --conf spark.task.cpus=1 --conf spark.default.parallelism=600 --conf spark.sql.broadcastTimeout=500 --conf spark.sql.shuffle.partitions=600 --conf spark.sql.adaptive.enabled=true --database tpcds_bin_partitioned_orc_3
    Start the SparkExtension plugin.
    spark-sql --deploy-mode client --driver-cores 8 \
                                   --driver-memory 40G \
                                   --num-executors 24 \
                                   --executor-cores 12 \
                                   --executor-memory 25g \
                                   --master yarn \
                                   --conf spark.sql.codegen.wholeStage=false \
                                   --jars /home/ockadmin/opt/ock/jars/* \
                                   --properties-file /home/ock_spark.conf \
                                   --database tpcds_bin_partitioned_orc_3
  3. Check whether OmniShuffle has taken effect.

    If the command output contains "Connected to meta rpc server<**.**.**.**> successfully", OmniShuffle has taken effect.