Rate This Document
Findability
Accuracy
Completeness
Readability

Change Description

The OmniShuffle shuffle acceleration feature of Kunpeng BoostKit for Big Data uses the plugin mechanism provided by Spark to implement the Shuffle Manager and Broadcast Manager plugin interfaces and replace the native Shuffle and Broadcast of Spark in a non-intrusive manner. It reduces disk I/Os, accelerates data exchange between nodes, and improves query efficiency.

New Features

  • OmniShuffle enables in-memory shuffle by implementing the Shuffle Manager plugin interface. That is, the shuffle process is completed in the memory pool based on memory semantics.
  • The Broadcast Manager interface is implemented to enable variable broadcast based on memory pool sharing, improving the transmission efficiency of broadcast variables among executors.
  • OmniShuffle automatically adjusts the parallelism degree of Spark SQL jobs in real time based on historical data, eliminating the need to manually optimize the parallelism degree and reducing spills in the shuffle-reduce process by 90%.

Modified Features

None

Removed Features

None