Rate This Document
Findability
Accuracy
Completeness
Readability

Java Process Suspension

Fault Locating

The server does not respond to requests and the Java process is not stopped. The CPU usage remains unchanged, and the garbage collector stops working. Figure 1 shows how to locate and rectify the fault.

Figure 1 Fault locating of the Java process suspension problem
  1. Run the top command to check whether the Java process exists and whether the CPU usage of related processes is stable when the service stop responding.
  2. Run the jstat command to check the number of young GC times. If the number does not increase when there are service requests, ensure that the garbage collector stops working.
  3. Run the jstack command to print the Java process stack information and verify that most service processes are in the Block state and are frozen.
  4. View thread logs and analyze the cause.
  5. Modify the code. Run the code again to verify the modification.
  6. Incorporate the modification if the problem has been resolved.
  7. Add the debugging information and locate the fault again if the fault persists.

Case: Suspension Caused by the GC Mechanism

Symptom

When customer services are tested on both x86 and Kunpeng servers, the Kunpeng node freezes irregularly and does not respond to the heartbeat query of the active node. As a result, the Kunpeng node is removed from the cluster.

Fault Locating

  1. Run the top command. The command output shows that the Java process exists and the CPU usage of related process is stable.

  2. Run the jstat command. The command output shows that the number of young GC times does not increase. It is confirmed that the garbage collector has stopped working.

  3. Run the jstack command to print the stack information of the Java process. The command output shows that most service processes are in the Block state.

  4. Run the pastack command to print the call stack information of the Presto process after the node is disconnected. It is found that the related thread keeps running the SpinPause function, which is invoked by the GC mechanism. In normal cases, the GC ends normally. However, the Presto process on the node keeps running the SpinPause function. It is confirmed that the problem is caused by the JDK.
  5. Replace the JDK with BiSheng JDK. The freezing problem does not occur, and the process suspension problem is resolved.