Rate This Document
Findability
Accuracy
Completeness
Readability

Machine Learning Algorithm Library

Based on the algorithm principles and chip characteristics, the MLlib library of open source Spark is optimized to improve the performance by 50% compared with the open source versions.

The machine learning algorithm library optimizes the algorithms below. More algorithms will be added in later versions.

Classification and regression (Random Forest, GBDT, SVM, Logistic Regression, Linear Regression, Decision Tree, XGBoost, and KNN), clustering (K-means, DBSCAN, and LDA), feature engineering (PCA, SPCA, SVD, Pearson, Covariance, Spearman, IDF, DTB, and Word2Vec), and pattern mining (ALS, PrefixSpan, and SimRank)

Table 1 lists the common application scenarios of the algorithms.

Table 1 Common application scenarios

Algorithm Name

Carrier

Finance

Transportation

Random Forest

  • High-value customer segmentation
  • Terminal life cycle analysis
  • Analysis of subscriber device change behaviors
  • Insurance fraud identification
  • Online transaction fraud detection
  • Credit risk assessment
  • Debt risk rating and warning
  • Street racing analysis
  • Ticket scalper analysis
  • Traffic signal timing optimization

GBDT

  • Identification of other-network high-value customers
  • Full-frequency and dual-SIM terminal analysis
  • Non-compliant terminal device sales
  • Customer credit assessment
  • Credit risk assessment
  • Debt risk rating and warning
  • Post-loan risk rating
  • Customer financial profile
  • Insurance customer risk analysis
  • Insurance customer churn analysis
  • Marketing strategy development of insurance enterprises
  • Traffic incident detection
  • Traffic violation vehicle identification

SVM

  • Identification and attraction of high-value customers
  • Identification and escalation of customers for upsell
  • Price forecast for the international carbon financial market
  • Enterprise bankruptcy prediction
  • Vehicle insurance pricing
  • Recognition of vehicles with cloned or fake license plates
  • Traffic flow prediction for road networks
  • Traffic flow prediction
  • Street racing analysis

Logistic Regression

  • Fraud warning
  • Risk evaluation
  • Intelligent energy consumption prediction
  • Credit risk analysis of Internet finance P2P services
  • Post-loan risk analysis
  • Identification of large-amount foreign exchange fund transactions
  • Customer credit assessment
  • Credit rating of listed companies
  • Warning of extreme risks in the financial market
  • Traffic flow prediction for road networks
  • Driving safety index modeling
  • Road traffic capability evaluation
  • Recognition of vehicles with cloned or fake license plates
  • Traffic flow prediction
  • Street racing analysis

Linear Regression

  • International toll call and roaming service analysis
  • Credit rating
  • Identification of financial report fraud of listed companies
  • Warning of commercial bank financial risk
  • Customer credit risk factor assessment
  • Small and medium-sized enterprise credit risk assessment
  • Supply chain financial risk assessment
  • Road traffic capability evaluation
  • Recognition of vehicles with cloned or fake license plates
  • Traffic flow prediction for road networks
  • Traffic situation analysis

Decision Tree

  • Warning of broadband subscriber churn
  • Warning of expired broadband subscribers
  • Customer classification for Internet finance precision marketing
  • Customer classification for commercial bank telemarketing
  • Quantitative investment strategy development
  • Credit card approval
  • Post-loan risk rating
  • Street racing analysis
  • Ticket scalper analysis
  • Traffic incident detection

XGBoost

  • Segmentation of mobile number portability (MNP) port-in subscribers
  • Port-out subscriber prediction
  • Intelligent O&M: fault detection and prediction
  • Intelligent energy consumption management: base station/server energy consumption prediction
  • Debt risk rating and warning
  • Online transaction fraud detection
  • User consumption behavior prediction and risk analysis
  • Fund return forecast
  • Forecast of top holdings within a portfolio
  • Insurance customer risk analysis
  • Insurance customer churn analysis
  • Marketing strategy development of insurance enterprises
  • Traffic congestion analysis
  • Traffic signal timing optimization
  • Travel mode recommendation
  • Vehicle checkpoint deployment
  • Personal profiling/holographic archiving (analysis of residence, age, gender, consumption level, occupation, etc.)
  • Object trajectory prediction

KNN

  • Terminal app insight
  • Campus marketing
  • Resident compound identification
  • Credit card fraud risk monitoring
  • Financial data exception monitoring
  • Medical insurance review
  • Abnormal traffic behavior analysis
  • Accompanying person analysis

K-means

  • Reactivation of inactive subscribers
  • Targeted tariff design
  • Subscriber package adaptation
  • Plan for financial IC card promotion in cities
  • Classification of de facto exchange rate systems
  • Insurance customer credit analysis
  • Analysis of consumers' willingness to buy insurance on Internet
  • Vehicle origin-destination (OD) analysis
  • Checkpoint data governance
  • High-risk area identification

DBSCAN

  • Customer family group identification
  • Identification and attraction of campus customers
  • Identification and attraction of customers from other networks
  • Customer group distribution
  • Segmentation of commercial bank customer values
  • Bank loan risk management
  • Insurance fraud monitoring
  • Identification of business risks among small- and medium-sized banks
  • CRM customer segmentation model for insurance industry
  • Thermal analysis of rail transportation sites
  • Thermal analysis of rail transportation groups
  • Analysis of commuting lines
  • Parking location analysis

LDA

  • Improper information governance
  • Content recommendation
  • Stock clustering for financial knowledge services
  • Analysis of the relationship between financial and technology media sentiment and internet loan market
  • Knowledge acquisition of financial decision-making support
  • Knowledge findings in corporate annual reports
  • Financial time information extraction
  • Medical insurance fraud monitoring
  • Traffic hotspot identification
  • Digitalization of traffic law enforcement cases

PCA

  • Extraction of key subscriber features
  • Subscriber identification
  • Subscriber credit investigation characteristics
  • Data engineering of recommendation model
  • Data engineering of risk assessment model
  • Data engineering of motor vehicle insurance fraud identification
  • Data engineering of supply chain financial credit risk assessment model
  • Warning of overdue repayment of borrowing companies
  • Traffic sign image recognition
  • Road safety prediction
  • Cause analysis of traffic accidents and association analysis
  • Urban traffic intersection correlation analysis

SVD

  • Abnormal order traffic detection
  • Network poisoning attack detection and location
  • Network cloud transmission data compression
  • Supplier selection
  • Supplier evaluation methods
  • Efficiency analysis of financial support for strategic emerging industries (data engineering) and commercial bank customer value segmentation (data engineering)
  • Factor dimension reduction of quantitative investment stock selection
  • Equity portfolio recommendation
  • Traffic data preprocessing
  • Extraction of vehicle travel behavior characteristics
  • Traffic data compression
  • Periodic traffic characteristics extraction

Pearson

  • Mobile station location
  • Accompanying person analysis
  • Abnormal order traffic detection
  • Identification and attraction of migrated customers
  • User matching policy
  • Market risk management
  • Asset risk value model analysis
  • Insurance claim analysis
  • Road pass time prediction
  • Multi-sensor vehicle information convergence
  • Intelligent order dispatching
  • Detection of abnormal traffic trajectory

Covariance

  • User loyalty analysis
  • User preference analysis
  • User churn analysis
  • Illegal sales of voucher cards
  • Channel standby card
  • Stock correlation analysis
  • Investment portfolio analysis
  • Asset configuration analysis
  • Asset risk value model analysis
  • Road condition prediction
  • Congestion propagation analysis
  • Trajectory matching analysis
  • Intelligent order dispatching
  • Detection of abnormal traffic trajectory

Spearman

  • User matching policy
  • Benefits-preferred users
  • User churn analysis
  • Users whose FBB services are promoted by MBB
  • Credit card registration recommendation
  • Customer benefits recommendation
  • Fraud gang analysis
  • Insurance customer profiling
  • Passenger flow prediction and analysis
  • Mining of congested urban areas
  • Detection of abnormal traffic trajectory
  • Intelligent order dispatching

DTB

  • Valued user mining
  • User package recommendation
  • Mobile site selection recommendation
  • Credit card approval
  • Quality customer recommendation
  • Precise advertisement push
  • Traffic light optimization
  • Dangerous driving behavior detection
  • Congested road prediction

Word2Vec

  • Content recommendation
  • Campus marketing
  • User preference analysis
  • Customer financial profile
  • Credit risk assessment
  • Financial data exception monitoring
  • Asset risk value model analysis
  • Traffic hotspot identification
  • Similar route recommendation

ALS

  • Port-in subscriber product adaptation
  • Campus/Return-to-home marketing
  • Level-1 e-channel precision marketing
  • Travel services
  • Identification and escalation of customers for upsell
  • Business recommendation
  • Content recommendation
  • Intelligent app recommendation
  • Dividend-paying life insurance pricing
  • Structural difference analysis of life insurance demands
  • Investor sentiment measure
  • American option pricing simulation
  • Dangerous driving behavior detection
  • Similar route recommendation

PrefixSpan

  • Intelligent O&M: fault detection and prediction
  • Intelligent energy consumption management: base station/server energy consumption prediction
  • Debt risk rating and warning
  • Online transaction fraud detection
  • User consumption behavior prediction and risk analysis
  • Fund return forecast
  • Forecast of top holdings within a portfolio
  • Traffic congestion analysis
  • Traffic signal timing optimization
  • Travel mode recommendation
  • Vehicle checkpoint deployment

The big data algorithm library provides the same APIs as those provided by Spark MLlib, ensuring that customers' applications can use the algorithm library without any modification.

For details about how to deploy the big data algorithm library, see Machine Learning Algorithm Library Feature Guide.

Huawei Kunpeng 920 5250 processors use the machine learning algorithm library to process public datasets on the web, delivering over 50% computing performance improvement compared with that of peer vendors using Spark algorithms.

Figure 1 Algorithm library performance comparison