我要评分
获取效率
正确性
完整性
易理解

Known Issues

Trouble Ticket No.

DTS2024060329127

Severity

Minor

Symptom

In the scenario where Spark executes INSERT statements and there is only one data partition, when Sort Merge Join (SMJ) is performed in 50 consecutive tables, the off-heap memory is used up and the OmniOperator SMJ operator calls the new statement to apply for the vector memory. As a result, a core dump occurs.

Root Cause

  1. OmniOperator uses column-based processing and occupies more memory than open source Spark which uses row-based processing. In addition, the resources applied by the SMJ operator during the calculation can only be released after the task is complete.
  2. The problem occurs when INSERT statements are executed and there is only one data partition. Spark generates only one task. As a result, Sort Merge Join on 50 tables is executed in one task. In this case, the configured 38 GB off-heap memory is used up by the 50 consecutive SMJ operators during the calculation. When the new statement is used to apply for memory, a core dump occurs.

Impact

This problem occurs in a high-load scenario. Spark jobs aim to leverage the concurrent processing capability of large clusters. In normal scenarios, the situation where a single task (single thread) executes JOIN for a large number of tables does not exist. This problem has not occurred in actual service scenarios and has little impact on customers.

Workaround

  1. Adjust the value of spark.memory.offHeap.size to increase the off-heap memory and run the service again.
  2. Roll back to the Spark open source version to trigger services.

Solution

  1. Add this troubleshooting case to the Feature Guide so that customers can prevent or locate the problem.
  2. Resolve this problem in the next commercial release of Kunpeng BoostKit 24.0.0.