Deploying the Hive Engine

Planning the Cluster Environment

The environment planned in this section consists of seven servers, including one task submission node, three compute nodes, and three storage nodes. In the big data cluster, the Hive client functions as the task submission node, and the compute nodes are agent1, agent2, and agent3. The storage nodes are ceph1, ceph2, and ceph3. See Figure 1.

Figure 1 Environment configuration

Table 1 lists the hardware environment of the cluster.

**Table 1** Hardware configurations
Item	Model
Processor	Kunpeng 920 5220
Memory size	384 GB (12 x 32 GB)
Memory frequency	2666 MHz
NIC	25GE for the service network and GE for the management network
Drive	System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD) Management node: 12 x RAID 0 (1 x 4 TB SATA HDD) Service node: 12 x RAID 0 (1 x 4 TB SATA HDD) 1 x 3.2 TB NVMe
RAID controller card	LSI SAS3508

Table 2 lists the required software versions.

**Table 2** Software configurations
Item	Version
OS	openEuler 20.03 LTS SP1
JDK	BiSheng JDK-8u262
Hadoop	3.2.0
Spark	3.0.0
Hive	3.1.0
ZooKeeper	3.6.2
Ceph	14.2.8

Installing the Hive Engine

During the installation, select /opt/hive/boostkit as the software installation directory and place all JAR packages that Hive depends on in this directory, as shown in Table 3.

**Table 3** Installation directory
Installation Node	Installation Directory	Component	How to Obtain
Server (server1)	/opt/hive/boostkit	aws-java-sdk-bundle-1.11.375.jar	Download it from the Kunpeng Community.
		bcpkix-jdk15on-1.68.jar	Download it from the Kunpeng Community.
		bcprov-jdk15on-1.68.jar	Download it from the Kunpeng Community.
		boostkit-omnidata-server-1.3.0-aarch64.jar	Download it from the Huawei Support website.
		boostkit-omnidata-hive-exec-3.1.0-1.3.0.jar	Download it from the Kunpeng Community or use the source code for compilation.
		guava-31.1-jre.jar	Download it from the Kunpeng Community.
		hadoop-aws-3.2.0.jar	Download it from the Kunpeng Community.
		kryo-shaded-4.0.2.jar	Download it from the Kunpeng Community.
		haf-jni-call-1.2.0.jar	Download it from the Huawei Support website.
		hdfs-ceph-3.2.0.jar	Download it from the Kunpeng Community.
		hetu-transport-1.6.1.jar	Download it from the Kunpeng Community.
		jackson-annotations-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-core-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-databind-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-datatype-guava-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-datatype-jdk8-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-datatype-joda-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-datatype-jsr310-2.12.4.jar	Download it from the Kunpeng Community.
		jackson-module-parameter-names-2.12.4.jar	Download it from the Kunpeng Community.
		jasypt-1.9.3.jar	Download it from the Kunpeng Community.
		jol-core-0.2.jar	Download it from the Kunpeng Community.
		joni-2.1.5.3.jar	Download it from the Kunpeng Community.
		log-0.193.jar	Download it from the Kunpeng Community.
		perfmark-api-0.23.0.jar	Download it from the Kunpeng Community.
		presto-main-1.6.1.jar	Download it from the Kunpeng Community.
		presto-spi-1.6.1.jar	Download it from the Kunpeng Community.
		protobuf-java-3.12.0.jar	Download it from the Kunpeng Community.
		slice-0.38.jar	Download it from the Kunpeng Community.

The aws-java-sdk-bundle-1.11.375.jar, hadoop-aws-3.2.0.jar, and hdfs-ceph-3.2.0.jar packages need to be added in the Ceph environment. The HDFS environment does not require these packages.

Create an /opt/hive/boostkit directory.
1

mkdir -p /opt/hive/boostkit

On the task submission node (server1), upload the boostkit-omnidata-server-1.3.0-aarch64.jar package (in BoostKit-omnidata_1.3.0.zip\BoostKit-omnidata_1.3.0.tar.gz\boostkit-omnidata-server-1.3.0-aarch64.tar.gz\omnidata\lib) obtained in Obtaining Software to the /opt/hive/boostkit directory.

        
             cp boostkit-omnidata-server-1.3.0-aarch64.jar /opt/hive/boostkit

Upload the haf-jni-call-1.2.0.jar package (in BoostKit-haf_1.2.0.zip\haf-1.2.0.tar.gz\haf-host-1.2.0.tar.gz\lib\jar) obtained in Obtaining Software to the /opt/hive/boostkit directory.

        
             cp haf-jni-call-1.2.0.jar /opt/hive/boostkit

Upload hdfs-ceph-3.2.0.jar obtained in Obtaining Software and aws-java-sdk-bundle-1.11.375.jar and hadoop-aws-3.2.0.jar in boostkit-omnidata-server-1.3.0-aarch64-lib.zip to the /opt/hive/boostkit directory. (If the HDFS storage system is used, skip this step.)

        
             cp aws-java-sdk-bundle-1.11.375.jar /opt/hive/boostkit
cp hadoop-aws-3.2.0.jar /opt/hive/boostkit
cp hdfs-ceph-3.2.0.jar /opt/hive/boostkit

Use the FTP tool to upload the boostkit-omnidata-hive-exec-3.1.0-1.3.0.zip package to the installation environment and decompress the package.

        
             unzip boostkit-omnidata-hive-exec-3.1.0-1.3.0.zip

Copy the JAR packages in boostkit-omnidata-hive-exec-3.1.0-1.3.0.zip to the /opt/hive/boostkit directory.

        
             cd boostkit-omnidata-hive-exec-3.1.0-1.3.0
cp *.jar /opt/hive/boostkit

If you need to manually compile boostkit-omnidata-hive-exec-3.1.0-1.3.0.jar, compile it based on README.md.

Create a tez-ndp directory, obtain the tez.tar.gz package from HDFS (default path: /apps/tez/tez.tar.gz), and decompress it.

        
             cd /opt/hive/
mkdir tez-ndp
cd tez-ndp
hdfs dfs -get /apps/tez/tez.tar.gz .
tar -zxvf tez.tar.gz

Copy the boostkit directory under /opt/hive/boostkit to the tez-ndp directory, compress and upload it to HDFS, and delete the original tez.tar.gz package.

        
             cd /opt/hive/tez-ndp
cp -r /opt/hive/boostkit .
rm -rf tez.tar.gz
tar -zcvf tez.tar.gz *
hdfs dfs -rmr /apps/tez/tez.tar.gz
hdfs dfs -put tez.tar.gz /apps/tez/

Add the following information to the end of the /usr/local/hive/conf/hive-env.sh file:

        
             export BOOSTKIT_HOME=/opt/hive/boostkit
for f in ${BOOSTKIT_HOME}/*.jar; do
  HIVE_CONF_DIR=${HIVE_CONF_DIR}:$f
done

Add the following information to the end of the /usr/local/tez/conf/tez-site.xml file:

        
             <property>
    <name>tez.user.classpath.first</name>
    <value>true</value>
</property>
<property>
    <name>tez.cluster.additional.classpath.prefix</name>
    <value>$PWD/tezlib/boostkit/*</value>
</property>
<property>
    <name>tez.task.launch.env</name>
    <value>PATH=/home/omm/haf-host/bin:$PATH,LD_LIBRARY_PATH=/home/omm/haf-host/lib:$LD_LIBRARY_PATH,CLASS_PATH=/home/omm/haf-host/lib/jar/haf-jni-call-1.2.0.jar:$CLASS_PATH,HAF_CONFIG_PATH=/home/omm/haf-host/</value>
</property>

Run the scp command to copy the .so files in the HAF installation directory /home/omm/haf-install/haf-host/lib to the /usr/local/hadoop/lib/native directory on the node where Hadoop is installed.

         
              cd /home/omm/haf-install/haf-host/lib
cp lib* /usr/local/hadoop/lib/native
scp lib* {Hadoop_node}:/usr/local/hadoop/lib/native
Example:
scp lib* agent1:/usr/local/hadoop/lib/native
scp lib* agent2:/usr/local/hadoop/lib/native
scp lib* agent3:/usr/local/hadoop/lib/native

Parent topic: Using OmniData on the Hive Engine