Error Reported When Viewing the OmniShuffle Log
Symptom
When you view the OmniShuffle log, the error message "Failed to send sync package, Operation timed out" is displayed. As a result, the handshake fails, but the Spark task ends normally.
Key Process and Cause Analysis
Generally, this error occurs because the remaining memory of the peer system is insufficient or the memory is severely fragmented. As a result, the connection between the two ends takes a long time and the timeout error is triggered.
Conclusion and Solution
Reduce the memory configured for OmniShuffle and Spark. Spark has an internal retry mechanism. When this error occurs, the Spark task retries and continues to run. In this case, the corresponding retry log is displayed on the Spark web page.
Parent topic: Troubleshooting