鲲鹏社区首页
中文
注册
JVM coredump分析系列(5):使用netty-tcnative出现SIGSEGV crash分析

JVM coredump分析系列(5):使用netty-tcnative出现SIGSEGV crash分析

DevKit

发表于 2023/08/16

0

问题背景

笔者在分析业务线问题时,多次遇到使用 netty-tcnative 出现SIGSEGV crash的问题,在此处归纳整理下问题定位思路并且给出复现的用例,以便提升定位同类问题的效率。用户在业务进程使用netty中openssl实现的TLS,并且在启动参数中配置 -Djdk.tls.ephemeralDHKeySize=3072,进程启动后访问业务出现 SIGSEGV crash。具体crash堆栈信息如下:

Stack: [0x00007f822a2d6000,0x00007f822a317000],  sp=0x00007f822a312d38,  free space=243k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x172761]  __strlen_sse2_pminub+0x11
C  [libio_grpc_netty_shaded_netty_tcnative_linux_x86_641710973062976904606.so+0x186a78]
C  [libio_grpc_netty_shaded_netty_tcnative_linux_x86_641710973062976904606.so+0x28f50]
j  io.grpc.netty.shaded.io.netty.internal.tcnative.SSLContext.setTmpDHLength(JI)V+0
j  io.grpc.netty.shaded.io.netty.handler.ssl.ReferenceCountedOpenSslContext.<init>(Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/OpenSslApplicationProtocolNegotiator;JJI[Ljava/security/cert/Certificate;Lio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZZ)V+532
j  io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslContext.<init>(Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/OpenSslApplicationProtocolNegotiator;JJI[Ljava/security/cert/Certificate;Lio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZ)V+21
j  io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslServerContext.<init>([Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/OpenSslApplicationProtocolNegotiator;JJLio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZLjava/lang/String;)V+21
j  io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslServerContext.<init>([Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/ApplicationProtocolConfig;JJLio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZLjava/lang/String;)V+33
j  io.grpc.netty.shaded.io.netty.handler.ssl.SslContext.newServerContextInternal(Lio/grpc/netty/shaded/io/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/ApplicationProtocolConfig;JJLio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZLjava/lang/String;)Lio/grpc/netty/shaded/io/netty/handler/ssl/SslContext;+152
j  io.grpc.netty.shaded.io.netty.handler.ssl.SslContextBuilder.build()Lio/grpc/netty/shaded/io/netty/handler/ssl/SslContext;+79
... java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7707ba]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xe3a
V  [libjvm.so+0x76dd5b]  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x28b
V  [libjvm.so+0x76e347]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x57
V  [libjvm.so+0x7b912b]  thread_entry(JavaThread*, Thread*)+0x7b
V  [libjvm.so+0xc1f583]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xc1f8d8]  JavaThread::run()+0x328
V  [libjvm.so+0xa09702]  java_start(Thread*)+0x112
C  [libpthread.so.0+0x7e15]  start_thread+0xc5

问题分析

(1)分析JDK中系统属性 jdk.tls.ephemeralDHKeySize 大小限制

查看JDK源码中系统属性 jdk.tls.ephemeralDHKeySize 大小限制,可以看出该系统属性大小的范围是[1024,8192],并且是 64 的倍数。

(2)分析hs_err_pid日志文件

从hs_err_pid日志文件中查看错误日志调用堆栈,发现触发SIGSEGV的是 io.grpc.netty.shaded.io.netty.internal.tcnative.SSLContext.setTmpDHLength 方法,这是 netty-tcnative 的一个native方法,第二个参数 length 大小为系统属性 jdk.tls.ephemeralDHKeySize 配置的大小。


(3)分析 netty-tcnative 中系统属性 jdk.tls.ephemeralDHKeySize 大小限制

从github中找到 netty-tcnative 相关源码,发现该方法进一步对keySize大小进行限制,只支持 512、1024、2048、4096,配置无效的keySize会抛出Exception。


(4)分析crash产生的根因

即使设置了 jdk.tls.ephemeralDHKeySize=3072 ,正常现象应该是抛出 java.lang.Exception: Unsupported length 3072 ,为什么进程却直接crash了?

我们进一步分析下源码,可以发现 639 行的 tcn_Throw 函数参数格式和参数存在不匹配的问题,参数是个int类型,而格式设置成 %s,从而导致crash。查看netty社区可以发现 netty-tcnative-boringssl-static 2.0.57.Final 版本修复了配置无效的keySize出现crash问题,具体修复代码[1]如下所示:


复现方法

(1)maven 依赖

<dependencies>
  <dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-all</artifactId>
    <version>4.1.82.Final</version>
  </dependency>
  <dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-tcnative-boringssl-static</artifactId>
    <version>2.0.54.Final</version>
    <classifier>${os.detected.classifier}</classifier>
  </dependency>
</dependencies>
<build>
  <extensions>
    <extension>
      <groupId>kr.motd.maven</groupId>
      <artifactId>os-maven-plugin</artifactId>
      <version>1.4.0.Final</version>
    </extension>
  </extensions>
</build>

(2)生成私钥和证书

openssl genrsa -out rsa.key
openssl req -new -key rsa.key -subj "/C=China/ST=Beijing/L=Beijing"  -out rsa.csr
openssl x509 -req -days 3650 -in rsa.csr -signkey rsa.key -out rsa.crt
openssl pkcs8 -topk8 -inform PEM -in rsa.key -outform pem -out rsa_enc_pkcs8.key -v1 PBE-SHA1-3DES  -passin pass:12345678 -passout pass:12345678

(3)复现用例

import io.netty.handler.ssl.SslContext;
import io.netty.handler.ssl.SslContextBuilder;

import javax.net.ssl.SSLException;
import java.io.File;

public class SslContextBuilderTest {
    public static void main(String[] args) {
        System.setProperty("jdk.tls.ephemeralDHKeySize", "3072");
        File keyCertChainFile = new File("rsa.crt");
        File keyFile = new File("rsa_enc_pkcs8.key");
        SslContextBuilder sslContextBuilder = SslContextBuilder.forServer(
                keyCertChainFile, keyFile, "12345678");
        try {
            SslContext sslContext = sslContextBuilder.build();
        } catch (SSLException e) {
            throw new RuntimeException(e);
        }
    }
}

总结

1. JDK的 ephemeralDHKeySize 大小的限制是[1024,8192],并且是 64 的倍数,支持 3072。而在netty-tcnative时,要注意 ephemeralDHKeySize 是不支持 3072,只支持 512、1024、2048、4096。

2. 如果使用的 netty-tcnative-boringssl-static 版本低于 2.0.57.Final,设置无效的 ephemeralDHKeySize 会导致进程SIGSEGV crash。

参考

1. https://github.com/netty/netty-tcnative/pull/759/commits/eecaaa8e4222de1af05f9ccda0324b7c50955c97

本页内容