Architecture

OmniShuffle runs in big data clusters of the customer's data center as a performance acceleration component of the big data engine Spark. It employs effective features such as unified addressing of the memory pool, data exchange in memory semantics, and converged shuffle to reduce the drive I/O overhead, quicken the data analysis process, and improve cluster resource utilization.

For shuffle-intensive jobs in Spark, a large amount of data needs to be exchanged across nodes after the map process is complete. Statistics show that the Spark shuffle process spends the most time and resources in many scenarios, and even accounts for 50% to 80% of the end-to-end time overhead of Spark services. Spark provides the Shuffle Manager plugin to optimize the shuffle process. You can use the interfaces defined by Shuffle Manager to customize Shuffle Manager and gear shuffle policies, methods, and processes.

As a performance acceleration component of Spark, OmniShuffle uses the plugin mechanism provided by Spark to implement the Shuffle Manager and Broadcast Manager plugin interfaces and replace the native Shuffle and Broadcast of Spark in a non-intrusive manner.

OmniShuffle enables in-memory shuffle by implementing the Shuffle Manager plugin interface. That is, the shuffle process is completed in the memory pool based on memory semantics, reducing shuffle data flushing to drives. The time overhead and computing power overhead caused by data flushing and reading, serialization and deserialization, compression and decompression can be lessened. The Broadcast Manager interface is implemented to enable variable broadcast based on memory pool sharing, improving the transmission efficiency of broadcast variables among executors. In addition, OmniShuffle supports two network modes: Remote Direct Memory Access (RDMA) and TCP. Compared with TCP, RDMA improves transmission efficiency, requires less computing power, and implements efficient data exchange between nodes.

OmniShuffle automatically adjusts the parallelism degree of Spark SQL jobs in real time based on historical data, eliminating the need to manually optimize the parallelism degree and reducing spills in the shuffle-reduce process by 90%. Due to this, OmniShuffle quickens big data cluster jobs while increasing the job throughput.

Overall Solution

OmniShuffle enables in-memory shuffle by implementing the Shuffle Manager plugin interface. Figure 1 shows the overall solution architecture of OmniShuffle. Table 1 describes the subsystems.

Figure 1 Logical architecture of OmniShuffle

**Table 1** Subsystems
Subsystem	Description
Memory pool kit	Provides the distributed shared memory infrastructure and basic memory semantics.
Metadata	Stores basic information about shuffle data and nodes.
Shuffle semantics	Provides APIs for implementing Spark shuffle semantics.
Broadcast semantics	Provides APIs for implementing Spark broadcast semantics.

Service Process

OmniShuffle implements the Spark Shuffle Manager interface to achieve in-memory shuffle. Figure 2 shows the OmniShuffle service process.

Figure 2 Service process

After connecting OmniShuffle to Spark, you can access the Spark CLI to view the cluster status. Administrators and O&M personnel can view the cluster status on the OmniShuffle CLI.

Parent topic: Feature Overview