(Offline for 24.0.RC1) Task Occasionally Suspended When a 10 TB Dataset Is Run on Spark 3.1.1 with OmniOperator Enabled
Symptom
Based on BiSheng JDK 1.8.0.342, when OmniOperator is enabled to execute SQL services for a 10 TB large dataset on Spark 3.1.1, there is a low probability that the Spark task is suspended and jobs cannot be ended due to a defect in the JDK loading mechanism in high-concurrency scenarios.

Key Process and Cause Analysis
In high-concurrency scenarios, when BiSheng JDK 1.8.0.342 invokes the JNI interface to load classes, thread deadlock may occur due to a JDK defect. For details, see issue JDK-8266310.
Conclusion and Solution
On the Spark WebUI, find the executor to which the suspended task belongs (as shown in the following figure), and run the kill command in the background to stop the executor process. This operation does not affect the consistency of task results.

Parent topic: OmniOperator