Environment
Physical Networking
The physical environment consists of one management node and three data nodes. The management network and service network are deployed separately. The management network uses GE electrical ports for communication, and the service network uses 10GE optical ports for communication. Figure 1 shows the networking diagram.
Hardware requirements
Table 1 lists the hardware requirements.
|
Item |
Description |
|---|---|
|
Processor |
Kunpeng 920 5220 |
|
Memory size |
384 GB (12 x 32 GB) |
|
Memory frequency |
2666 MHz |
|
Network |
10GE for the service network and GE for the management network |
|
Drive |
System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD) Data drive: 12 x RAID 0 (1 x 4 TB SATA HDD) |
|
RAID Controller Card |
LSI SAS3508 |
OS and Software Requirements
Table 2 lists the OS and software requirements.
|
Item |
Description |
|---|---|
|
OS |
openEuler 20.03 LTS SP1 |
|
JDK |
BiSheng JDK 1.8.0_262 |
|
ZooKeeper |
3.6.2 |
|
Hadoop |
3.1.1 |
|
Spark |
Spark 2.3.2, Spark 2.4.5, Spark 2.4.6, or Spark 3.1.1 |
- The machine learning algorithm library supports Spark 2.3.2, Spark 2.4.5, and Spark 2.4.6, and provides the same APIs as the native algorithm library.
- The algorithm library is compatible with Spark 3.1.1, but limited to the following algorithms: ALS, LDA, KNN, PrefixSpan, DBSCAN, Word2Vec, Decision Tree, DTB, Random Forest, and GBDT.
- Spark 2.4.5 and Spark 2.4.6 use the JAR package of Spark 2.4.6.
- It can also be adapted to 2.X and 3.X.
