Modifying the Spark Configuration Files

This section describes how to configure the integration environment for Spark, Hadoop, and Hive to ensure stable operation, while enabling support for event logging, job history query, and Hive data access.

All Spark configuration files are stored in the $SPARK_HOME/conf directory.

Switch to the Spark configuration directory.
1
cd $SPARK_HOME/conf

Modify the spark-env.sh file.

Create a copy of spark-env.sh.template and name it spark-env.sh.
1
cp spark-env.sh.template spark-env.sh
Open the spark-env.sh file.
1
vi spark-env.sh

Press i to enter the insert mode. Change the value of the environment variable JAVA_HOME to an absolute path, and specify the Hadoop directory, IP address and port number of the Spark master node, and Spark directory.

export JAVA_HOME=/usr/local/bisheng-jdk1.8.0_262
export HADOOP_HOME=/usr/local/hadoop
export SCALA_HOME=/usr/local/scala
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=server1
export SPARK_MASTER_PORT=7077

Press Esc, type :wq!, and press Enter to save the file and exit.

Modify the spark-defaults.conf file.

Rename the template as spark-defaults.conf.

cp spark-defaults.conf.template spark-defaults.conf

Add configuration items.

echo "spark.master                     yarn" >> spark-defaults.conf
echo "spark.eventLog.enabled           true" >> spark-defaults.conf
echo "spark.eventLog.dir               hdfs://server1:9000/spark2-history" >> spark-defaults.conf
echo "spark.eventLog.compress          true" >> spark-defaults.conf
echo "spark.history.fs.logDirectory    hdfs://server1:9000/spark2-history" >> spark-defaults.conf

Create a directory to store HDFS event logs.
1
hdfs dfs -mkdir /spark2-history

Copy the Hadoop configuration files core-site.xml and hdfs-site.xml.

cp /usr/local/hadoop/etc/hadoop/core-site.xml /usr/local/spark/conf
cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/spark/conf

Copy the Hive database configuration.

If the Hive database is required, modify the Hive configuration file and align it with the mariadb-java-client package.
If the mariadb-java-client package does not exist, download it from here and upload it to the /usr/local/spark/jars directory on the server.

Open the hive-site.xml file.
1
vim ${HIVE_HOME}/conf/hive-site.xml

Press i to enter the insert mode, and add or update hive.metastore.uris as follows:

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://server1:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
 </property>

Press Esc, type :wq!, and press Enter to save the file and exit.

Copy the Hive dependency package and configuration.

cp ${HIVE_HOME}/lib/mariadb-java-client-2.3.0.jar /usr/local/spark/jars
cp ${HIVE_HOME}/conf/hive-site.xml /usr/local/spark/conf/

Parent topic: Deploying Spark