Advantages
Comparison Between Popular Solutions
The popular solutions in the industry for data analysis and prediction include rule-based analysis and open source algorithm libraries. Table 1 shows the comparison results of the popular solutions and the Kunpeng BoostKit for Big Data algorithm library.
Item |
Rule-based Analysis |
Open Source Algorithm Library |
Kunpeng BoostKit for Big Data Algorithm Library |
|---|---|---|---|
Usage |
Relies on databases. ISVs customize SQL statements or SQL-like analysis technologies. |
Based on single-node Python algorithm library or native Spark algorithms |
Improved based on Spark distributed algorithms, with more algorithms and better algorithm accuracy and performance |
Advantages |
|
|
|
Disadvantages |
|
|
N/A |
Application Scenario |
|
|
|
Product Competitiveness of the Algorithm Library
The Kunpeng BoostKit for Big Data algorithm library has the following advantages:
- High performance: Compared with open source algorithms, the algorithm library improves the algorithm performance by multiple times and supports larger datasets.
- The PCA algorithm delivers 10x higher performance and supports 1,000x larger feature scale (from tens of thousands to tens of millions) than the open source algorithm. PCA supports tens of millions of samples and tens of millions of feature dimensions.
- The DBSCAN algorithm yields 24x higher performance and supports 5x larger feature dimensions than the open source algorithm (from 2 dimensions to 10 dimensions). DBSCAN supports computing of up to 20-dimension samples.
- Full coverage: The algorithm library includes common algorithms such as classification and regression, feature engineering, backbone analysis, clustering, and pattern mining.
- Easy to deploy: The algorithm library has the same class and interface definitions as the native Spark algorithm, and no modification is required for upper-layer applications.