Rate This Document
Findability
Accuracy
Completeness
Readability

Constraints

This section describes the constraints on the machine learning algorithm library.

Impact on the System

The machine learning algorithm library has no impact on the system.

Usage Restrictions

Table 1 describes the usage restrictions.

Table 1 Constraints

Item

Description

OSs

  • CentOS 7.6
  • openEuler 20.03 LTS SP1
  • openEuler 22.03 LTS

Components

  • The library supports Spark 2.3.2, Spark 2.4.5, and Spark 2.4.6, and provides the same APIs as the native algorithm library.
  • Added the adaption to Spark 3.1.1 (but limited to the following algorithms: ALS, LDA, KNN, PrefixSpan, DBSCAN, Word2Vec, Decision Tree, DTB, Random Forest, and GBDT).
  • Spark 2.4.5 and Spark 2.4.6 use the JAR package of the same version.
  • It can also be adapted to 2.X and 3.X.

Hardware

Kunpeng servers

Mixed deployment

  • A Spark cluster cannot have Kunpeng and another type of servers at the same time.
  • The machine learning and graph analysis algorithms cannot be used in a task with other open source algorithms.

Performance metrics

Based on specific datasets and product parameters, the machine learning algorithm library on Kunpeng 920 5220 processors improves the computing performance by more than 20% compared with the native MLlib algorithms on the x86 5318. For details, see the Big Data Machine Learning Algorithm Library Acceptance Test Guide.

The Kunpeng BoostKit machine learning algorithm library for Spark 2.4.6 uses the same core code as the native algorithm library for Spark 2.3.2, so their algorithm execution results are the same. The algorithm execution results may be different from the native algorithm library for Spark 2.4.6 (for example, DTB), depending on whether there are functional changes between algorithm libraries for open source Spark 2.3.2 and Spark 2.4.6.

Feature Interactions

The machine learning algorithm library does not interact with other features.