Rate This Document
Findability
Accuracy
Completeness
Readability

Environment

Physical Networking

The physical environment consists of one management node and three data nodes. The management network and service network are deployed separately. The management network uses GE electrical ports for communication, and the service network uses 10GE optical ports for communication. Figure 1 shows the networking diagram.

Figure 1 Physical networking

Hardware requirements

Table 1 lists the hardware requirements.

Table 1 Hardware requirements

Item

Description

Processor

Kunpeng 920 5220

Memory size

384 GB (12 x 32 GB)

Memory frequency

2666 MHz

Network

10GE for the service network and GE for the management network

Drive

System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD)

Data drive: 12 x RAID 0 (1 x 4 TB SATA HDD)

RAID Controller Card

LSI SAS3508

OS and Software Requirements

Table 2 lists the OS and software requirements.

Table 2 OS and software requirements

Item

Description

OS

openEuler 20.03 LTS SP1

JDK

BiSheng JDK 1.8.0_262

ZooKeeper

3.6.2

Hadoop

3.1.1

Spark

Spark 2.3.2, Spark 2.4.5, Spark 2.4.6, or Spark 3.1.1

  • The machine learning algorithm library supports Spark 2.3.2, Spark 2.4.5, and Spark 2.4.6, and provides the same APIs as the native algorithm library.
  • The algorithm library is compatible with Spark 3.1.1, but limited to the following algorithms: ALS, LDA, KNN, PrefixSpan, DBSCAN, Word2Vec, Decision Tree, DTB, Random Forest, and GBDT.
  • Spark 2.4.5 and Spark 2.4.6 use the JAR package of Spark 2.4.6.
  • It can also be adapted to 2.X and 3.X.