Core Dumps Occasionally Occur When Spark Is Executed to Query the Parquet Data Source Based on libhdfs.so of Hadoop 3.2.0

Symptom

When Spark is used to query the Parquet data source, if OmniOperator is enabled and depends on libhdfs.so from Hadoop 3.2.0, a core dump occasionally occurs. The error stack is as follows:

Stack: [0x00007fb9e8e5d000,0x00007fb9e8f5e000], sp=0x00007fb9e8f5cd40, free space=1023k
Native frames: (J=compiled Java code, j=interpreted , Vv=VM code, C=native code)
C [libhdfs.so+0xcd39]  hdfsThreadDestructor+0xb9
------------------  PROCESS  ------------------  
VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: ([ mutex/lock event])
[0x00007fbbc00119b0] CodeCache_lock - owner thread: 0x00007fbbc00d9800
[0x00007fbbc0012ab0] AdapterHandlerLibrary_lock - owner thread: 0x00007fba04451800

heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit
Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
Compressed class space size: 1073741824 Address: 0x0000000100000000

Key Process and Cause Analysis

This is a bug in Hadoop 3.2.0. For details, see Issues. If JNIEnv is invoked to obtain JVM information after a JVM has exited, a core dump occurs. However, this bug is not completely fixed in the community, and core dumps occur occasionally because JNIEnv can sometimes become a dangling pointer, which is not considered in the fix.

Conclusion and Solution

Check in advance whether the JVM exists on the OS. If the JVM is present, detach the current thread from the JVM using the obtained JNIEnv pointer. This can effectively avoid core dump exceptions that may happen when JNIEnv becomes a dangling pointer. Use the libhdfs.so file provided in this version to avoid this problem. Alternatively, use the patch provided by the community to recompile libhdfs.so. For details, see GitHub.

Parent topic: OmniOperator