Machine Learning Algorithm Library
Based on the algorithm principles and chip characteristics, the MLlib library of open source Spark is optimized to improve the performance by 50% compared with the open source versions.
The machine learning algorithm library optimizes the algorithms below. More algorithms will be added in later versions.
Classification and regression (Random Forest, GBDT, SVM, Logistic Regression, Linear Regression, Decision Tree, XGBoost, and KNN), clustering (K-means, DBSCAN, and LDA), feature engineering (PCA, SPCA, SVD, Pearson, Covariance, Spearman, IDF, DTB, and Word2Vec), and pattern mining (ALS, PrefixSpan, and SimRank)
Table 1 lists the common application scenarios of the algorithms.
|
Algorithm Name |
Carrier |
Finance |
Transportation |
|
Random Forest |
|
|
|
|
GBDT |
|
|
|
|
SVM |
|
|
|
|
Logistic Regression |
|
|
|
|
Linear Regression |
|
|
|
|
Decision Tree |
|
|
|
|
XGBoost |
|
|
|
|
KNN |
|
|
|
|
K-means |
|
|
|
|
DBSCAN |
|
|
|
|
LDA |
|
|
|
|
PCA |
|
|
|
|
SVD |
|
|
|
|
Pearson |
|
|
|
|
Covariance |
|
|
|
|
Spearman |
|
|
|
|
DTB |
|
|
|
|
Word2Vec |
|
|
|
|
ALS |
|
|
|
|
PrefixSpan |
|
|
|
The big data algorithm library provides the same APIs as those provided by Spark MLlib, ensuring that customers' applications can use the algorithm library without any modification.
For details about how to deploy the big data algorithm library, see Machine Learning Algorithm Library Feature Guide.
Huawei Kunpeng 920 5250 processors use the machine learning algorithm library to process public datasets on the web, delivering over 50% computing performance improvement compared with that of peer vendors using Spark algorithms.