Cluster Environment

A cluster consists of one client, one controller node, and three compute nodes. Figure 1 shows the networking diagram. The controller node functions as the server, and the compute nodes are agent1, agent2, and agent3 of the big data cluster. In POC test scenarios, the client can be deployed on the controller node.

Figure 1 Networking diagram

Cluster Hardware

Table 1 shows the hardware configurations used by all nodes in the cluster.

**Table 1** Hardware configurations
Item	Requirement
Processor	Kunpeng 920 processor
Memory size	384 GB (12 x 32 GB)
Memory frequency	2666 MHz
NIC	10GE for the service network and GE for the management network
Drive	System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD) Data drive: 12 x RAID 0 (1 x 4 TB SATA HDD)
RAID controller card	LSI SAS3508

Cluster Software

Table 2 lists the required software versions.

**Table 2** Recommended software configurations in the cluster environment
Item	Node Type	Requirement
OS	All nodes	openEuler 20.03 LTS SP1
JDK	All nodes	BiSheng JDK 1.8.0_262
ZooKeeper	Compute node	3.6.2
Hadoop	All nodes	3.1.1
Spark	All nodes	Spark 2.3.2, Spark 2.4.5, Spark 2.4.6, or Spark 3.1.1

The algorithm library supports openEuler 20.03 LTS SP1 and CentOS 7.6. This document uses openEuler 20.03 LTS SP1 as an example.
For details about cluster deployment, see Spark Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03). The Spark deployment mode is Spark on Yarn.
The machine learning algorithm library applies to Spark 2.3.2, Spark 2.4.5, Spark 2.4.6, and Spark 3.1.1. Among the versions, Spark 3.1.1 supports only part of the algorithms (see Constraints). For security, later versions are recommended. The compatibility with other platforms has not been verified yet.

Parent topic: Deploying Spark Algorithms in a Kunpeng Cluster