Cluster Environment

A cluster consists of one client, one controller node, and three compute nodes. Figure 1 shows the networking diagram. The controller node functions as the server, and the compute nodes are agent1, agent2, and agent3 of the big data cluster. In POC test scenarios, the client can be deployed on the controller node.

Figure 1 Networking diagram

Cluster Hardware

Table 1 shows the hardware configurations used by all nodes in the cluster.

**Table 1** Hardware configurations
Item	Requirement
Processor	Kunpeng 920 processor/Kunpeng 920 high-performance processor (80 cores)
Memory size	384 GB (12 x 32 GB)
Memory frequency	2666 MHz
NIC	10GE for the service network and GE for the management network
Drive	System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD) Data drive: 12 x RAID 0 (1 x 4 TB SATA HDD)
RAID controller card	LSI SAS3508

Cluster Software

Table 2 lists the required software versions.

**Table 2** Recommended software configurations in the cluster environment
Item	Node Type	Requirement
OS	All nodes	openEuler 22.03 LTS SP1
JDK	All nodes	BiSheng JDK 1.8.0_342
ZooKeeper	Compute node	3.6.2
Hadoop	All nodes	3.2.0
Spark	All nodes	Spark 3.3.1

For details about cluster deployment, see Spark Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03). The Spark deployment mode is Spark on Yarn.
The algorithm library is compatible with Spark 3.3.1 and supports only part of the algorithms (see Constraints). For security, later versions are recommended. The compatibility with other platforms has not been verified yet.

Parent topic: Deploying Spark Algorithms in a Kunpeng Cluster