Advantages

Comparison Between Popular Solutions

The popular solutions in the industry for data analysis and prediction include rule-based analysis and open source algorithm libraries. Table 1 shows the comparison results of the popular solutions and the Kunpeng BoostKit for Big Data algorithm library.

**Table 1** Comparison between popular solutions
Item	Rule-based Analysis	Open Source Algorithm Library	Kunpeng BoostKit for Big Data Algorithm Library
Usage	Relies on databases. ISVs customize SQL statements or SQL-like analysis technologies.	Based on single-node Python algorithm library or native Spark algorithms	Improved based on Spark distributed algorithms, with more algorithms and better algorithm accuracy and performance
Advantages	Easy to interpret and understand Easy to use based on the SQL technology	Supports complex data analysis, such as classification prediction, clustering, and community mining. Distributed memory computing, higher performance than SQL	A wide range of distributed algorithms for all scenarios High algorithm accuracy for better performance Supports large-scale dataset analysis.
Disadvantages	Manual rule customization, low accuracy Long data analysis time Does not support complex analysis such as trend prediction.	Limited computing power of single-node algorithms, which makes it cannot be used to analyze large-scale datasets. Limited distributed algorithms and inadequate scenario coverage	N/A
Application Scenario	Small volumes of data Accurate rules available	Medium volumes of data Entry-level Spark in scenarios with low performance requirements	Massive volumes of data High-precision and high-performance scenarios

Product Competitiveness of the Algorithm Library

The Kunpeng BoostKit for Big Data algorithm library has the following advantages:

High performance: Compared with open source algorithms, the algorithm library improves the algorithm performance by multiple times and supports larger datasets.
- The PCA algorithm delivers 10x higher performance and supports 1,000x larger feature scale (from tens of thousands to tens of millions) than the open source algorithm. PCA supports tens of millions of samples and tens of millions of feature dimensions.
- The DBSCAN algorithm yields 24x higher performance and supports 5x larger feature dimensions than the open source algorithm (from 2 dimensions to 10 dimensions). DBSCAN supports computing of up to 20-dimension samples.
Full coverage: The algorithm library includes common algorithms such as classification and regression, feature engineering, backbone analysis, clustering, and pattern mining.
Easy to deploy: The algorithm library has the same class and interface definitions as the native Spark algorithm, and no modification is required for upper-layer applications.

Parent topic: Algorithm Library Overview