Rate This Document
Findability
Accuracy
Completeness
Readability

Cluster Environment

A cluster consists of one client, one controller node, and three compute nodes. Figure 1 shows the networking diagram. The controller node functions as the server, and the compute nodes are agent1, agent2, and agent3 of the big data cluster. In POC test scenarios, the client can be deployed on the controller node.

Figure 1 Networking diagram

Cluster Hardware

Table 1 shows the hardware configurations used by all nodes in the cluster.

Table 1 Hardware configurations

Item

Requirement

Processor

Kunpeng 920 processor/Kunpeng 920 high-performance processor (80 cores)

Memory size

384 GB (12 x 32 GB)

Memory frequency

2666 MHz

NIC

10GE for the service network and GE for the management network

Drive

System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD)

Data drive: 12 x RAID 0 (1 x 4 TB SATA HDD)

RAID controller card

LSI SAS3508

Cluster Software

Table 2 lists the required software versions.

Table 2 Recommended software configurations in the cluster environment

Item

Node Type

Requirement

OS

All nodes

openEuler 22.03 LTS SP1

JDK

All nodes

BiSheng JDK 1.8.0_342

ZooKeeper

Compute node

3.6.2

Hadoop

All nodes

3.2.0

Spark

All nodes

Spark 3.3.1

  • For details about cluster deployment, see Spark Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03). The Spark deployment mode is Spark on Yarn.
  • The algorithm library is compatible with Spark 3.3.1 and supports only part of the algorithms (see Constraints). For security, later versions are recommended. The compatibility with other platforms has not been verified yet.