Classification and Regression

Scenarios

Classification and regression analysis is a predictive modeling technology that explores the relationship between labels and features, where labels may be considered as dependent variables, and features as independent variables. Classification and regression algorithms are usually applied in predictive analysis and modeling regression.

Specifically, algorithms such as Linear Regression and Logistic Regression are used for credit risk analysis of Internet finance P2P services and traffic flow prediction for road networks; SVM used for rice forecast for the international carbon financial market and traffic flow prediction; and GBDT and XGBoost used for debt risk rating and warning and travel mode recommendation.

Regression algorithms use multiple iterations to converge approximately to label variables for model training. The Kunpeng BoostKit big data machine learning algorithm library optimizes iteration algorithms and fully exploits the high-concurrency capabilities of Kunpeng processors to reduce the number of iterations during the training process, improving the algorithm performance by multiple times.

Principles

Support vector machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner. Its decision-making boundary is the maximum-margin hyperplane for solving learning samples. SVM is a sparse and robust classifier that uses the hinge loss function to calculate empirical risks and adds regularization items to the problem-solving system to relieve structural risks. The LinearSVC algorithm of Spark introduces two optimization policies: reducing the times of invoking the f functions (distributed computing of loss and gradient of the target functions) through algorithm principle optimization, and accelerating convergence by increasing momentum parameter updates.

Parent topic: Machine Learning Algorithms