Rate This Document
Findability
Accuracy
Completeness
Readability

(Offline for 24.0.RC1) Task Occasionally Suspended When a 10 TB Dataset Is Run on Spark 3.1.1 with OmniOperator Enabled

Symptom

Based on BiSheng JDK 1.8.0.342, when OmniOperator is enabled to execute SQL services for a 10 TB large dataset on Spark 3.1.1, there is a low probability that the Spark task is suspended and jobs cannot be ended due to a defect in the JDK loading mechanism in high-concurrency scenarios.

Key Process and Cause Analysis

In high-concurrency scenarios, when BiSheng JDK 1.8.0.342 invokes the JNI interface to load classes, thread deadlock may occur due to a JDK defect. For details, see issue JDK-8266310.

Conclusion and Solution

On the Spark WebUI, find the executor to which the suspended task belongs (as shown in the following figure), and run the kill command in the background to stop the executor process. This operation does not affect the consistency of task results.