Machine Learning Algorithm Library

Based on the algorithm principles and chip characteristics, Kunpeng optimizes the MLlib library of open-source Spark. This boosts performance by 50% over open-source versions.

The machine learning algorithm library optimizes the algorithms below. More algorithms will be added in later versions.

Classification and regression (Random Forest, GBDT, SVM, Logistic Regression, Linear Regression, Decision Tree, XGBoost, and KNN), clustering (K-means, DBSCAN, and LDA), feature engineering (PCA, SPCA, SVD, Pearson, Covariance, Spearman, IDF, DTB, and Word2Vec), and pattern mining (ALS, PrefixSpan, and SimRank)

Table 1 lists the common application scenarios of the algorithms.

**Table 1** Common application scenarios
Algorithm Name	Carrier	Finance	Transportation
Random Forest	High-value customer segmentation Terminal life cycle analysis Analysis of subscriber device change behaviors	Insurance fraud identification Online transaction fraud detection Credit risk assessment Debt risk rating and warning	Street racing analysis Ticket scalper analysis Traffic signal timing optimization
GBDT	Identification of other-network high-value customers Full-frequency and dual-SIM terminal analysis Non-compliant terminal device sales	Customer credit assessment Credit risk assessment Debt risk rating and warning Post-loan risk rating Customer financial profile Insurance customer risk analysis Insurance customer churn analysis Marketing strategy development of insurance enterprises	Traffic incident detection Traffic violation vehicle identification
SVM	Identification and attraction of high-value customers Identification and escalation of customers for upsell	Price forecast for the international carbon financial market Enterprise bankruptcy prediction Vehicle insurance pricing	Recognition of vehicles with cloned or fake license plates Traffic flow prediction for road networks Traffic flow prediction Street racing analysis
Logistic Regression	Fraud warning Risk evaluation Intelligent energy consumption prediction	Credit risk analysis of Internet finance P2P services Post-loan risk analysis Identification of large-amount foreign exchange fund transactions Customer credit assessment Credit rating of listed companies Warning of extreme risks in the financial market	Traffic flow prediction for road networks Driving safety index modeling Road traffic capability evaluation Recognition of vehicles with cloned or fake license plates Traffic flow prediction Street racing analysis
Linear Regression	International toll call and roaming service analysis Credit rating	Identification of financial report fraud of listed companies Warning of commercial bank financial risk Customer credit risk factor assessment Small and medium-sized enterprise credit risk assessment Supply chain financial risk assessment	Road traffic capability evaluation Recognition of vehicles with cloned or fake license plates Traffic flow prediction for road networks Traffic situation analysis
Decision Tree	Warning of broadband subscriber churn Warning of expired broadband subscribers	Customer classification for Internet finance precision marketing Customer classification for commercial bank telemarketing Quantitative investment strategy development Credit card approval Post-loan risk rating	Street racing analysis Ticket scalper analysis Traffic incident detection
XGBoost	Segmentation of mobile number portability (MNP) port-in subscribers Port-out subscriber prediction Intelligent O&M: fault detection and prediction Intelligent energy consumption management: base station/server energy consumption prediction	Debt risk rating and warning Online transaction fraud detection User consumption behavior prediction and risk analysis Fund return forecast Forecast of top holdings within a portfolio Insurance customer risk analysis Insurance customer churn analysis Marketing strategy development of insurance enterprises	Traffic congestion analysis Traffic signal timing optimization Travel mode recommendation Vehicle checkpoint deployment Personal profiling/holographic archiving (analysis of residence, age, gender, consumption level, occupation, etc.) Object trajectory prediction
KNN	Terminal app insight Campus marketing Resident compound identification	Credit card fraud risk monitoring Financial data exception monitoring Medical insurance review	Abnormal traffic behavior analysis Companion vehicle discovery
K-means	Reactivation of inactive subscribers Targeted tariff design Subscriber package adaptation	Plan for financial IC card promotion in cities Classification of de facto exchange rate systems Insurance customer credit analysis Analysis of consumers' willingness to buy insurance on Internet	Vehicle origin-destination (OD) analysis Checkpoint data governance High-risk area identification
DBSCAN	Customer family group identification Identification and attraction of campus customers Identification and attraction of customers from other networks Customer group distribution	Segmentation of commercial bank customer values Bank loan risk management Insurance fraud monitoring Identification of business risks among small- and medium-sized banks CRM customer segmentation model for insurance industry	Thermal analysis of rail transportation sites Thermal analysis of rail transportation groups Analysis of commuting lines Parking location analysis
LDA	Improper information governance Content recommendation	Stock clustering for financial knowledge services Analysis of the relationship between financial and technology media sentiment and internet loan market Knowledge acquisition of financial decision-making support Knowledge findings in corporate annual reports Financial time information extraction Medical insurance fraud monitoring	Traffic hotspot identification Digitalization of traffic law enforcement cases
PCA	Extraction of key subscriber features Subscriber identification Subscriber credit investigation characteristics Data engineering of recommendation model Data engineering of risk assessment model	Data engineering of motor vehicle insurance fraud identification Data engineering of supply chain financial credit risk assessment model Warning of overdue repayment of borrowing companies	Traffic sign image recognition Road safety prediction Cause analysis of traffic accidents and association analysis Urban traffic intersection correlation analysis
SVD	Abnormal order traffic detection Network poisoning attack detection and location Network cloud transmission data compression Supplier selection Supplier evaluation methods	Efficiency analysis of financial support for strategic emerging industries (data engineering) and commercial bank customer value segmentation (data engineering) Factor dimension reduction of quantitative investment stock selection Equity portfolio recommendation	Traffic data preprocessing Extraction of vehicle travel behavior characteristics Traffic data compression Periodic traffic characteristics extraction
Pearson	Mobile station location Companion vehicle discovery Abnormal order traffic detection Identification and attraction of migrated customers User matching policy	Market risk management Asset risk value model analysis Insurance claim analysis	Road pass time prediction Multi-sensor vehicle information convergence Intelligent order dispatching Detection of abnormal traffic trajectory
Covariance	User loyalty analysis User preference analysis User churn analysis Illegal sales of voucher cards Channel standby card	Stock correlation analysis Investment portfolio analysis Asset configuration analysis Asset risk value model analysis	Road condition prediction Congestion propagation analysis Trajectory matching analysis Intelligent order dispatching Detection of abnormal traffic trajectory
Spearman	User matching policy Benefits-preferred users User churn analysis Users whose FBB services are promoted by MBB	Credit card registration recommendation Customer benefits recommendation Fraud gang analysis Insurance customer profiling	Passenger flow prediction and analysis Mining of congested urban areas Detection of abnormal traffic trajectory Intelligent order dispatching
DTB	Valued user mining User package recommendation Mobile site selection recommendation	Credit card approval Quality customer recommendation Precise advertisement push	Traffic light optimization Dangerous driving behavior detection Congested road prediction
Word2Vec	Content recommendation Campus marketing User preference analysis	Customer financial profile Credit risk assessment Financial data exception monitoring Asset risk value model analysis	Traffic hotspot identification Similar route recommendation
ALS	Port-in subscriber product adaptation Campus/Return-to-home marketing Level-1 e-channel precision marketing Travel services Identification and escalation of customers for upsell Business recommendation Content recommendation	Intelligent app recommendation Dividend-paying life insurance pricing Structural difference analysis of life insurance demands Investor sentiment measure American option pricing simulation	Dangerous driving behavior detection Similar route recommendation
PrefixSpan	Intelligent O&M: fault detection and prediction Intelligent energy consumption management: base station/server energy consumption prediction	Debt risk rating and warning Online transaction fraud detection User consumption behavior prediction and risk analysis Fund return forecast Forecast of top holdings within a portfolio	Traffic congestion analysis Traffic signal timing optimization Travel mode recommendation Vehicle checkpoint deployment

The big data algorithm library provides the same APIs as those provided by Spark MLlib, ensuring that customers' applications can use the algorithm library without any modification.

For details about how to deploy the big data algorithm library, see Machine Learning Algorithm Library Feature Guide.

Kunpeng 920 processors use the machine learning algorithm library to process public datasets on the web, delivering over 50% computing performance improvement compared with that of peer vendors using Spark algorithms.

Figure 1 Algorithm library performance comparison

Parent topic: Solution Features