Core Dumps Occasionally Occur When Spark Is Executed to Query the Parquet Data Source Based on libhdfs.so of Hadoop 3.2.0
Symptom
When Spark is used to query the Parquet data source, if OmniOperator is enabled and depends on libhdfs.so of Hadoop 3.2.0, a core dump occasionally occurs. The error stack is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 | Stack: [Ox00007fb9e8e5d000,0x00007fb9e8f5e000], sp=0x00007fb9e8f5cd40, free space=1023k Native frames: (J=compiled Java code, j=interpreted , Vv=VM code, C=native code) C [libhdfs.so+0xcd39] hdfsThreadDestructor+0xb9 ------------------ PROCESS ------------------ VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: ([ mutex/lock event]) [0x00007fbbc00119b0] CodeCache_lock - owner thread: 0x00007fbbc00d9800 [Ox00007fbbc0012ab0] AdapterHandlerLibrary_lock - owner thread: 0x00007fba04451800 heap address: 0x00000000c0000000, size: 1024 MB, Compressed 0ops mode: 32-bit Narrow klass base: 0x0000000000000000, Narrow klass shift: 3 Compressed class space size: 1073741824 Address: 0x0000000100000000 |
Key Process and Cause Analysis
This is a bug in Hadoop 3.2.0 (https://issues.apache.org/jira/browse/HDFS-15270). If JNIEnv is invoked to obtain JVM information after a JVM exits, a core dump occurs. However, this bug is not completely fixed in the community, and core dumps occur occasionally because the situation where JNIEnv is a wild pointer is not considered.
Conclusion and Solution
Check whether a JVM exists in the operating system in advance. If yes, use the obtained JVM pointer to detach the current thread to prevent core dumps caused by JNIEnv being a wild pointer. Use the libhdfs.so file provided in this version to avoid this problem, or use the patch (https://github.com/apache/hadoop/pull/5955) provided by the community to recompile the libhdfs.so file.