Change Description
OmniShuffle leverages Spark's plugin mechanism to implement the plugin interfaces of Shuffle Manager and Broadcast Manager, replacing the Shuffle and Broadcast modules in the open source Spark version in a non-intrusive manner. This replacement reduces drive I/O, improves inter-node data exchange efficiency, and significantly optimizes query performance.
New Features
- OmniShuffle enables in-memory shuffle by implementing the Shuffle Manager plugin interface. That is, the shuffle process is completed in the memory pool based on memory semantics.
- The Broadcast Manager interface is implemented to enable variable broadcast based on memory pool sharing, improving the transmission efficiency of broadcast variables among executors.
- OmniShuffle automatically adjusts the parallelism degree of Spark SQL jobs in real time based on historical data, eliminating the need to manually optimize the parallelism degree and reducing spills in the shuffle-reduce process by 90%.
Modified Features
None
Removed Features
None
Parent topic: V1.3.0