Installing Spark
The OmniOperator feature supports the Spark engine. You need to install Spark on the management node and all compute nodes, and configure the SparkExtension dependency for openEuler.
- Install Spark. For details, see OS and Software Requirements.
- Download the SparkExtension plugin package and decompress it.
Download boostkit-omniop-spark-3.1.1-1.3.0-aarch64.zip from Obtaining Software and upload it to the /opt/omni-operator/ directory on the management node and all compute nodes. Decompress boostkit-omniop-spark-3.1.1-1.3.0-aarch64.zip to obtain boostkit-omniop-spark-3.1.1-1.3.0-aarch64.jar and dependencies.tar.gz.
- Install the SparkExtension dependency of the openEuler OS.
Configure the local yum source for each OS image and run the following commands to install the dependencies:
1yum install lz4-devel zstd-devel snappy-devel protobuf-c-devel protobuf-lite-devel boost-devel cyrus-sasl-devel jsoncpp-devel openssl-devel libatomic -y
- Configure SparkExtension.
- Obtain the ORC, Protobuf, Arrow, HDFS, and Parquet software installation packages from Obtaining Software, decompress them to obtain the liborc.so, libprotobuf.so.24, libarrow.so.1100, libarrow_dataset.so.1100, libarrow_substrait.so.1100, libhdfs.so, and libparquet.so.1100 files, and upload these files to the /opt/omni-operator/lib directory. Change the file permission to 550.
1chmod 550 /opt/omni-operator/lib/lib*
- Decompress boostkit-omniop-spark-3.1.1-1.3.0-aarch64.zip to obtain boostkit-omniop-spark-3.1.1-1.3.0-aarch64.jar, copy boostkit-omniop-spark-3.1.1-1.3.0-aarch64.jar to the /opt/omni-operator/lib directory, and set the permission on the software package to 550.
1chmod 550 /opt/omni-operator/lib/boostkit-omniop-spark-3.1.1-1.3.0-aarch64.jar
- Decompress the dependencies.tar.gz package extracted from boostkit-omniop-spark-3.1.1-1.3.0-aarch64.zip and copy the dependencies folder to the /opt/omni-operator/lib directory. Set the permission on the software package to 550.
1chmod -R 550 /opt/omni-operator/lib/dependencies/
- Obtain the ORC, Protobuf, Arrow, HDFS, and Parquet software installation packages from Obtaining Software, decompress them to obtain the liborc.so, libprotobuf.so.24, libarrow.so.1100, libarrow_dataset.so.1100, libarrow_substrait.so.1100, libhdfs.so, and libparquet.so.1100 files, and upload these files to the /opt/omni-operator/lib directory. Change the file permission to 550.
- Add the following environment variable to the ~/.bashrc file on all nodes:
1export OMNI_HOME=/opt/omni-operator
Parent topic: Using on Spark