Environment
Physical Networking
A cluster consists of one client, one controller node, and three compute nodes. Figure 1 shows the networking diagram. The controller node functions as the server, and the compute nodes are agent1, agent2, and agent3 of the big data cluster. In POC test scenarios, the client can be deployed on the controller node.
Hardware Requirements
Table 1 lists the hardware requirements.
|
Item |
Description |
|---|---|
|
Processor |
Kunpeng 920 5250 |
|
Memory size |
384 GB (12 x 32 GB) |
|
Memory frequency |
2666 MHz |
|
Network |
10GE for the service network and GE for the management network |
|
Drive |
System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD) Data drive: 12 x RAID 0 (1 x 4 TB SATA HDD) |
|
RAID controller card |
LSI SAS3508 |
OS and Software Requirements
Table 2 lists the OS and software requirements.
|
Item |
Description |
|---|---|
|
OS |
openEuler 22.03 LTS SP1 |
|
JDK |
BiSheng JDK 1.8.0_342 |
|
ZooKeeper |
3.6.2 |
|
Hadoop |
3.2.0 |
|
Spark |
Spark 3.3.1 |
- The
machine learning algorithm library adapts to Spark 3.1.1 and supports the SVM, DBSCAN, DTB, and Word2Vec algorithms. - It can also be adapted to 2.X and 3.X.
