Rate This Document
Findability
Accuracy
Completeness
Readability

Modifying the Spark Configuration Files

This section describes how to configure the integration environment for Spark, Hadoop, and Hive to ensure stable operation, while enabling support for event logging, job history query, and Hive data access.

All Spark configuration files are stored in the $SPARK_HOME/conf directory.

  1. Switch to the Spark configuration directory.
    1
    cd $SPARK_HOME/conf
    
  2. Modify the spark-env.sh file.
    1. Create a copy of spark-env.sh.template and name it spark-env.sh.
      1
      cp spark-env.sh.template spark-env.sh
      
    2. Open the spark-env.sh file.
      1
      vi spark-env.sh
      
    3. Press i to enter the insert mode. Change the value of the environment variable JAVA_HOME to an absolute path, and specify the Hadoop directory, IP address and port number of the Spark master node, and Spark directory.
      1
      2
      3
      4
      5
      6
      export JAVA_HOME=/usr/local/bisheng-jdk1.8.0_262
      export HADOOP_HOME=/usr/local/hadoop
      export SCALA_HOME=/usr/local/scala
      export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
      export SPARK_MASTER_IP=server1
      export SPARK_MASTER_PORT=7077
      
    4. Press Esc, type :wq!, and press Enter to save the file and exit.
  3. Modify the spark-defaults.conf file.
    1. Rename the template as spark-defaults.conf.
      cp spark-defaults.conf.template spark-defaults.conf
    2. Add configuration items.
      1
      2
      3
      4
      5
      echo "spark.master                     yarn" >> spark-defaults.conf
      echo "spark.eventLog.enabled           true" >> spark-defaults.conf
      echo "spark.eventLog.dir               hdfs://server1:9000/spark2-history" >> spark-defaults.conf
      echo "spark.eventLog.compress          true" >> spark-defaults.conf
      echo "spark.history.fs.logDirectory    hdfs://server1:9000/spark2-history" >> spark-defaults.conf
      
  4. Create a directory to store HDFS event logs.
    1
    hdfs dfs -mkdir /spark2-history
    
  5. Copy the Hadoop configuration files core-site.xml and hdfs-site.xml.
    1
    2
    cp /usr/local/hadoop/etc/hadoop/core-site.xml /usr/local/spark/conf
    cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/spark/conf
    
  6. Copy the Hive database configuration.
    • If the Hive database is required, modify the Hive configuration file and align it with the mariadb-java-client package.
    • If the mariadb-java-client package does not exist, download it from here and upload it to the /usr/local/spark/jars directory on the server.
    1. Open the hive-site.xml file.
      1
      vim ${HIVE_HOME}/conf/hive-site.xml 
      
    2. Press i to enter the insert mode, and add or update hive.metastore.uris as follows:
      1
      2
      3
      4
      5
      <property>
          <name>hive.metastore.uris</name>
          <value>thrift://server1:9083</value>
          <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
       </property>
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    4. Copy the Hive dependency package and configuration.
      1
      2
      cp ${HIVE_HOME}/lib/mariadb-java-client-2.3.0.jar /usr/local/spark/jars
      cp ${HIVE_HOME}/conf/hive-site.xml /usr/local/spark/conf/