Modifying the Spark Configuration Files
This section describes how to configure the integration environment for Spark, Hadoop, and Hive to ensure stable operation, while enabling support for event logging, job history query, and Hive data access.
All Spark configuration files are stored in the $SPARK_HOME/conf directory.
- Switch to the Spark configuration directory.
1cd $SPARK_HOME/conf
- Modify the spark-env.sh file.
- Create a copy of spark-env.sh.template and name it spark-env.sh.
1cp spark-env.sh.template spark-env.sh
- Open the spark-env.sh file.
1vi spark-env.sh - Press i to enter the insert mode. Change the value of the environment variable JAVA_HOME to an absolute path, and specify the Hadoop directory, IP address and port number of the Spark master node, and Spark directory.
1 2 3 4 5 6
export JAVA_HOME=/usr/local/bisheng-jdk1.8.0_262 export HADOOP_HOME=/usr/local/hadoop export SCALA_HOME=/usr/local/scala export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop export SPARK_MASTER_IP=server1 export SPARK_MASTER_PORT=7077
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Create a copy of spark-env.sh.template and name it spark-env.sh.
- Modify the spark-defaults.conf file.
- Rename the template as spark-defaults.conf.
cp spark-defaults.conf.template spark-defaults.conf
- Add configuration items.
1 2 3 4 5
echo "spark.master yarn" >> spark-defaults.conf echo "spark.eventLog.enabled true" >> spark-defaults.conf echo "spark.eventLog.dir hdfs://server1:9000/spark2-history" >> spark-defaults.conf echo "spark.eventLog.compress true" >> spark-defaults.conf echo "spark.history.fs.logDirectory hdfs://server1:9000/spark2-history" >> spark-defaults.conf
- Rename the template as spark-defaults.conf.
- Create a directory to store HDFS event logs.
1hdfs dfs -mkdir /spark2-history
- Copy the Hadoop configuration files core-site.xml and hdfs-site.xml.
1 2
cp /usr/local/hadoop/etc/hadoop/core-site.xml /usr/local/spark/conf cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/spark/conf
- Copy the Hive database configuration.
- Open the hive-site.xml file.
1vim ${HIVE_HOME}/conf/hive-site.xml
- Press i to enter the insert mode, and add or update hive.metastore.uris as follows:
1 2 3 4 5
<property> <name>hive.metastore.uris</name> <value>thrift://server1:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property>
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Copy the Hive dependency package and configuration.
1 2
cp ${HIVE_HOME}/lib/mariadb-java-client-2.3.0.jar /usr/local/spark/jars cp ${HIVE_HOME}/conf/hive-site.xml /usr/local/spark/conf/
- Open the hive-site.xml file.
Parent topic: Deploying Spark