Compiling and Configuring Spark

This section describes how to compile and deploy Spark and configure environment variables to establish an environment for subsequent distributed task processing.

The following uses Spark 3.3.1 as an example to describe the steps for compiling and configuring Spark, which also apply to other versions of Spark. In the following steps, spark-3.3.1-bin-hadoop3.2 is the name of the Spark installation package. Change it according to the actual situation.

Compile the Spark installation package. For details, see Spark Porting Guide (CentOS & openEuler).

Upload the Spark installation package to the /usr/local directory on the server1 node and decompress it.

cd /usr/local/
mv spark-3.3.1-bin-hadoop3.2.tgz /usr/local
tar -zxvf spark-3.3.1-bin-hadoop3.2.tgz

Create a soft link for subsequent version updates.
1
ln -s spark-3.3.1-bin-hadoop3.2 spark
Set Spark environment variables.
1. Open the /etc/profile file.
  1
  vi /etc/profile
2. Press i to enter the insert mode and add the following lines to the end of the file:
  1 2
  export SPARK_HOME=/usr/local/spark export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
3. Press Esc, type :wq!, and press Enter to save the file and exit.
4. Make the environment variables take effect.
  1
  source /etc/profile

Parent topic: Deploying Spark