JVM coredump分析系列(3):JNA 5.10.0升级到5.12.1进程core分析
发表于 2023/05/25
0
前言
笔者近期把JNA从5.10.0升级到5.12.1以后,发现进程会core,且hs_error中的堆栈全部是JDK的内部函数,很难分析出和JNA升级的关联。
以下是堆栈信息:
Stack: [0x00007f5602132000,0x00007f5602233000], sp=0x00007f560222f230, free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x5f35ec] Dictionary::find(int, unsigned int, Symbol*, ClassLoaderData*, Handle, Thread*)+0x9c
V [libjvm.so+0xbcd8c2] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x212
V [libjvm.so+0xbd0aa2] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x62
V [libjvm.so+0x57c8f6] ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x266
V [libjvm.so+0x57f894] ConstantPool::resolve_constant_at_impl(constantPoolHandle, int, int, Thread*)+0x7b4
V [libjvm.so+0x5802ba] ConstantPool::resolve_bootstrap_specifier_at_impl(constantPoolHandle, int, Thread*)+0xea
V [libjvm.so+0x932607] LinkResolver::resolve_invokedynamic(CallInfo&, constantPoolHandle, int, Thread*)+0x5f7
V [libjvm.so+0x93b11d] LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle, int, Bytecodes::Code, Thread*)+0x26d
V [libjvm.so+0x7672ae] InterpreterRuntime::resolve_invokedynamic(JavaThread*)+0x28e
后经过JNA升级的源码走读得以解决,本文主要针对相关源码进行分析。
问题分析
出问题的代码如下:
Memory dbIdPoint = new Memory(Native.getNativeSize(Integer.class));
...
freeMem(dbIdPoint);
// freeMem定义
public static void freeMem(Pointer pointer) {
Native.free(Pointer.nativeValue(pointer));
Pointer.nativeValue(pointer, 0);
}
5.10.0版本中,在Memory的finalize函数,在Memory对象GC时,会先判断内存指针peer是否未空指针,如果不为空,则释放内存,并把内存指针peer置零,也就是GC的时候Memory相关的内存就会释放。
protected void finalize() {
this.dispose();
}
protected synchronized void dispose() {
if (this.peer != 0L) {
try {
free(this.peer);
} finally {
this.peer = 0L;
this.reference.unlink();
}
}
}
到了5.12.0时,JNA为了提升并发性能,把Memory、CallbackReference和NativeLibrary的finilizer方法去掉了,引入cleaner来释放内存,具体见changelog (https://github.com/java-native-access/jna/blob/master/CHANGES.md)。
(1)Clearner是一个单例,维护一个后台线程cleanerThread和一个队列referenceQueue,后台线程从队列中移除对象,并调用这个对象的clean()方法。
public class Cleaner {
private final ReferenceQueue<Object> referenceQueue;
private final Thread cleanerThread;
private Cleaner() {
referenceQueue = new ReferenceQueue<Object>();
cleanerThread = new Thread() {
@Override
public void run() {
while(true) {
try {
Reference<? extends Object> ref = referenceQueue.remove();
if(ref instanceof CleanerRef) {
((CleanerRef) ref).clean();
}
} catch (InterruptedException ex) {
Logger.getLogger(Cleaner.class.getName()).log(Level.SEVERE, null, ex);
break;
} catch (Exception ex) {
Logger.getLogger(Cleaner.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
};
cleanerThread.setName("JNA Cleaner");
cleanerThread.setDaemon(true);
cleanerThread.start();
}
}
(2)创建Memory对象时,调用Cleaner的register方法,把自己和释放peer的方法MemoryDisposer传入。
public Memory(long size) {
// ...
cleanable = Cleaner.getCleaner().register(this, new MemoryDisposer(peer));
}
private static final class MemoryDisposer implements Runnable {
private long peer;
public MemoryDisposer(long peer) {
this.peer = peer;
}
@Override
public synchronized void run() {
try {
free(peer);
} finally {
allocatedMemory.remove(peer);
peer = 0;
}
}
}
(3)register方法会创建一个CleanerRef,它是一个PhantomReference对象,并以referenceQueue作为队列,PhantomReference的特点是,在obj GC的时候不会直接被释放掉,而是放入到referenceQueue中。
public synchronized Cleanable register(Object obj, Runnable cleanupTask) {
// The important side effect is the PhantomReference, that is yielded
// after the referent is GCed
return add(new CleanerRef(this, obj, referenceQueue, cleanupTask));
}
解决思路
把整个Memory申请与释放的过程画出来,如下图:
可以看出来在5.12.0中,有两个地方会释放内存peer:
(1)在业务代码里,通过freeMem方法主动释放。
(2)Memory对象被GC的时候,由cleaner调用MemoryDisposer#run()方法释放内存。由于MemoryDisposer对象的字段peer是在创建Memory的时候设置,原来释放内存的方法,是不会修改MemoryDisposer的peer的值,因此会导致GC后再次释放该内存,导致进程core。
而5.10.0没有MemoryDisposer这个对象,因此不会发生内存重复释放的问题。
在5.12.0中,可以参考如下思路进行问题解决:
// 错误的做法,只把Pointer中的peer置0
public static void freeMem(Pointer pointer) {
Native.free(Pointer.nativeValue(pointer));
Pointer.nativeValue(pointer, 0);
}
// 正确的做法,会调用MemoryDisposer#run,把Pointer的peer以及MemoryDisposer的peer都置0
public static void freeMem(Memory memory) {
memory.close();
}
新的写法调用Memory#close方法,会调用到com.sun.jna.Memory$MemoryDisposer#run方法把Pointer的peer以及MemoryDisposer的peer都置0,保证内存不会重复释放。
验证
通过上述修改,我们的进程成功拉起,JNA终于成功升级了。
另外为了更充分地验证我们的理论,我们写出了下面的Demo,希望可以给大家一个更直观的认识。
public class JnaDoubleFreeCore {
public static void freeMem(Memory pointer) {
Native.free(Pointer.nativeValue(pointer));
Pointer.nativeValue(pointer, 0);
}
public static void main(String[] args) throws InterruptedException {
while (true) {
Memory memory = new Memory(1024);
freeMem(memory);
memory = new Memory(1024);
System.gc();
Thread.sleep(1000);
memory.setInt(0, 1000);
System.out.println("----");
}
}
}
在linux上运行上面的代码,我们可以看到如下报错,显示内存被重复释放,进程coredump;修复freeMem后,问题解决。
----
free(): double free detected in tcache 2
Aborted (core dumped)
这时候可能有人要问,为什么这个demo的core是double free,而本文开头的堆栈是jvm的堆栈。那是因为业务程序比较复杂,有很多并发,内存被释放后很快被jvm重新申请用于别的用途,再次free不会造成double free,而是把jvm正在使用的内存给释放了,最终造成jvm运行的过程中内存的读写异常引起coredump。
本页内容