Big Data

What Is Kunpeng BoostKit Machine Learning Algorithm Library?

The Kunpeng BoostKit machine learning algorithm library is compatible with open-source Spark APIs. It has optimized machine learning algorithms, greatly improving the computing performance in big data algorithm scenarios.

What Are the Optimizations of Kunpeng BoostKit for Big Data Algorithm Library?

Kunpeng BoostKit for Big Data provides an algorithm library that has deep optimization in algorithm principles and Kunpeng affinity based on the open-source Spark algorithm, and therefore achieve up to 20 times higher algorithm execution efficiency.

Kunpeng affinity optimization: To fully match and give full play to the hardware advantages of the Kunpeng architecture, Kunpeng BoostKit optimizes algorithm affinity in terms of sparse memory access and multi-core parallelism. For details, see Machine Learning - Kunpeng Affinity Optimization.
Algorithm principle optimization: The Kunpeng BoostKit big data algorithm library optimizes algorithm principles to reduce algorithm complexity, greatly improving computing performance with the same computing precision. For details, see Machine Learning - Algorithm Principle Optimization.

What is Kunpeng BoostKit for Big Data OmniRuntime?

OmniRuntime consists of a series of features provided by Kunpeng BoostKit for Big Data in terms of application acceleration. It aims to improve the performance of end-to-end data loading, computing, and exchange through plugins, thereby improving the performance of big data analytics. OmniRuntime includes the following features: operator pushdown (OmniData), operator acceleration (OmniOperator), shuffle acceleration (OmniShuffle), and materialized view (OmniMV). In the data loading phase, OmniData implements near-data computing to reduce network data traffic. In the data computing phase, OmniOperator replaces open-source Java operators with high-performance operators to improve operator efficiency. For scenarios where repeated queries or subqueries exist, OmniMV identifies the optimal materialized view through AI algorithms to reduce the overhead of repeated subqueries and thus improve query efficiency.

For details, see OmniRuntime Feature Guide.

What Are Optimizations of OmniOperator?

OmniOperator has made the following two optimizations:

High-performance Omni operators: OmniOperator fully exploits the computing capabilities of hardware, especially the heterogeneous computing power. Compared with Java and Scala operators, OmniOperator greatly improves the computing engine performance.
Efficient data organization mode: OmniOperator defines a column-oriented storage mode independent of languages and uses off-heap memory to implement OmniVec, which can read data with zero copy. There is no serialization overhead, allowing users to process data in the memory more efficiently.

For details, see OmniOperator.

Which Big Data Engines Does OmniOperator Support?

OmniOperator supports the Spark engine.

What Are Optimizations of OmniMV?

OmniMV uses AI algorithms to recommend the optimal materialized view from historical SQL queries, automatically matches SQL statements with a materialized view in Spark, and replaces part of the SQL statement execution plan with the matched materialized view. This feature greatly reduces repeated calculations and increases the query efficiency. Its optimizations include:

Pre-calculates and caches batch queries. Compared with the original query from the base table, OmniMV greatly improves the performance of the computing engine.
Recommends the optimal materialized view using deep learning and reinforcement learning algorithms.

For details, see OmniMV.

Which Big Data Engines Does OmniMV Support?

OmniMV supports Spark and ClickHouse.

What Are Optimizations of OmniData?

OmniData pushes down the operators with low data selection rates to the storage nodes. In this way, data is read locally on the storage nodes for computing, and valid result datasets are returned to the compute nodes over the network. This improves the network transmission efficiency and optimizes the big data computing performance. The following optimizations are made:

Storage-compute collaboration: Offloads operators to storage nodes to reduce the CPU usage of compute nodes and improve the overall computing efficiency.
Data filtering: Filters out unnecessary data to reduce the amount of data processed by compute nodes.

For details, see OmniData.

Which Big Data Engines Does OmniData Support?

OmniData supports Spark, Hive, and openLooKeng.

Parent topic: Acceleration Features