我要评分
获取效率
正确性
完整性
易理解

Error Reported When Viewing the OmniShuffle Log

Symptom

When you view the OmniShuffle log, the error message "Failed to send sync package, Operation timed out" is displayed. As a result, the handshake fails, but the Spark task ends properly.

Key Process and Cause Analysis

Generally, this error occurs because the remaining memory of the peer system is insufficient or the memory is severely fragmented. As a result, the connection between the two ends takes a long time and the timeout error is triggered.

Conclusion and Solution

Reduce the memory configured for OmniShuffle and Spark. Spark has an internal retry mechanism. When this error occurs, the Spark task retries and continues to run. In this case, the corresponding retry log is displayed on the Spark web page.