Rate This Document
Findability
Accuracy
Completeness
Readability

Executing Spark Services

To use the OmniShuffle feature to execute Spark services, you need to start the Spark SQL CLI.

  1. Create an ock_spark.conf file in the /home directory of the master node. For details about the parameters in the file, see spark.conf.
    1. Create a file.
      vi /home/ock_spark.conf
    2. Press i to enter the insert mode and add the following content to the file:
      spark.task.cpus 1
      spark.shuffle.compress true
      spark.shuffle.spill.compress true
      spark.rdd.compress true
      spark.executor.extraClassPath     /home/ockadmin/opt/ock/jars/*
      spark.driver.extraClassPath       /home/ockadmin/opt/ock/jars/*
      
      spark.driver.extraJavaOptions -Djava.library.path=/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl 
       spark.executor.extraJavaOptions -Djava.library.path=/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl
       spark.driver.extraLibraryPath   /home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl:.
       spark.executor.extraLibraryPath /home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt /ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl:.
      
      spark.shuffle.manager              org.apache.spark.shuffle.ock.OCKRemoteShuffleManager
      spark.shuffle.ock.manager true
      spark.shuffle.ock.home /home/ockadmin/opt/ock
      spark.shuffle.ock.version 24.0.0
      spark.shuffle.ock.binaryType linux-aarch64
      spark.executorEnv.HCOM_CONNECTION_RECV_TIMEOUT_SEC 30
      
      spark.blacklist.enabled true
      spark.files.fetchFailure.unRegisterOutputOnHost true
      spark.shuffle.service.enabled  false
      spark.blacklist.application.fetchFailure.enabled true
      spark.serializer                        org.apache.spark.serializer.KryoSerializer
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    • The ock_spark.conf file can be stored in any directory. Specify this configuration file when starting Spark SQL.
    • Replace /home/ockadmin/opt/ock with the actual installation directory.
    • The spark.shuffle.manager parameter is mandatory. In RSS mode, set it to org.apache.spark.shuffle.ock.OCKRemoteShuffleManager. In ESS mode, set it to org.apache.spark.shuffle.ock.OCKShuffleManager.
  2. Start the Spark SQL CLI.

    Ensure that the database is running properly.

    • The following is an example of the native Spark SQL startup command. You can adjust the values of the configuration items based on your requirements.
      /usr/local/spark/bin/spark-sql --deploy-mode client --driver-cores 8 --driver-memory 40g --num-executors 30 --executor-cores 6 --executor-memory 35g --master yarn --conf spark.task.cpus=1 --conf spark.default.parallelism=600 --conf spark.sql.broadcastTimeout=500 --conf spark.sql.shuffle.partitions=600 --conf spark.sql.adaptive.enabled=true --database tpcds_bin_partitioned_orc_3 
    • Start the SparkExtension plugin.
      spark-sql --deploy-mode client --driver-cores 8 \
                                     --driver-memory 40G \
                                     --num-executors 24 \
                                     --executor-cores 12 \
                                     --executor-memory 25g \
                                     --master yarn \
                                     --jars /home/ockadmin/opt/ock/jars/* \
                                     --properties-file /home/ock_spark.conf \
                                     --database tpcds_bin_partitioned_orc_3 \
                                     1> /home/ockadmin/logs/sql.res 2>/home/ockadmin/tpcds/logs/sql.log
  3. Check whether OmniShuffle has taken effect.

    If the message "Shuffle initialize success" is displayed, OmniShuffle has taken effect.

    Figure 1 Command output
  4. Execute an SQL task.
    Figure 2 Executing an SQL task
    Figure 3 Command output