我要评分
获取效率
正确性
完整性
易理解

Executing Spark Services

To use the OmniShuffle feature to execute Spark services, you need to start the Spark SQL CLI.

Procedure

  1. Create an ock_spark.conf file in the /home directory of the master node. For details about the parameters in the file, see spark.conf.
    1. Create a file.
      vi /home/ock_spark.conf
    2. Press i to enter the insert mode and add the following content to the file:
      spark.task.cpus 1
      spark.shuffle.compress true
      spark.shuffle.spill.compress true
      spark.rdd.compress true
      spark.executor.extraClassPath     /home/ockadmin/opt/ock/jars/*
      spark.driver.extraClassPath       /home/ockadmin/opt/ock/jars/*
      
      spark.driver.extraJavaOptions -Djava.library.path=/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl 
       spark.executor.extraJavaOptions -Djava.library.path=/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl
       spark.driver.extraLibraryPath   /home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl:.
       spark.executor.extraLibraryPath /home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/datakit:/home/ockadmin/opt /ock/ucache/24.0.0/linux-aarch64/lib/mf:/home/ockadmin/opt/ock/ucache/24.0.0/linux-aarch64/lib/common/openssl:.
      
      spark.shuffle.manager              org.apache.spark.shuffle.ock.OCKRemoteShuffleManager
      spark.shuffle.ock.manager true
      spark.shuffle.ock.home /home/ockadmin/opt/ock
      spark.shuffle.ock.version 24.0.0
      spark.shuffle.ock.binaryType linux-aarch64
      spark.executorEnv.HCOM_CONNECTION_RECV_TIMEOUT_SEC 30
      
      spark.blacklist.enabled true
      spark.files.fetchFailure.unRegisterOutputOnHost true
      spark.shuffle.service.enabled  false
      spark.blacklist.application.fetchFailure.enabled true
      spark.serializer                        org.apache.spark.serializer.KryoSerializer
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    • The ock_spark.conf file can be stored in any directory. Specify this configuration file when starting Spark SQL.
    • Replace /home/ockadmin/opt/ock with the actual installation directory.
    • The spark.shuffle.manager parameter is mandatory. In RSS mode, set it to org.apache.spark.shuffle.ock.OCKRemoteShuffleManager. In ESS mode, set it to org.apache.spark.shuffle.ock.OCKShuffleManager.
  2. Start the Spark SQL CLI.

    Ensure that the database is running properly.

    • The following is an example of the open-source Spark SQL startup command. You can adjust the values of the configuration items based on your requirements.
      /usr/local/spark/bin/spark-sql --deploy-mode client --driver-cores 8 --driver-memory 40g --num-executors 30 --executor-cores 6 --executor-memory 35g --master yarn --conf spark.task.cpus=1 --conf spark.default.parallelism=600 --conf spark.sql.broadcastTimeout=500 --conf spark.sql.shuffle.partitions=600 --conf spark.sql.adaptive.enabled=true --database tpcds_bin_partitioned_orc_3 
    • Start the SparkExtension plugin.
      spark-sql --deploy-mode client --driver-cores 8 \
                                     --driver-memory 40G \
                                     --num-executors 24 \
                                     --executor-cores 12 \
                                     --executor-memory 25g \
                                     --master yarn \
                                     --jars /home/ockadmin/opt/ock/jars/* \
                                     --properties-file /home/ock_spark.conf \
                                     --database tpcds_bin_partitioned_orc_3 \
                                     1> /home/ockadmin/logs/sql.res 2>/home/ockadmin/tpcds/logs/sql.log
  3. Check whether OmniShuffle has taken effect.

    If the message "Shuffle initialize success" is displayed, OmniShuffle has taken effect.

    Figure 1 Command output
  4. Execute an SQL task.
    Figure 2 Executing an SQL task
    Figure 3 Command output