The Global Cache Process Restarts and Is Suspended After All ZooKeeper Faults Are Rectified
Symptom
After all ZooKeeper faults are rectified, Global Cache restarts unexpectedly and the restart is suspended for 50 minutes and fails.
Cause Analysis
After ZooKeeper faults are rectified, the CCM fails to obtain the distributed lock and proactively restarts. During the restart, the CCM is suspended and services are not recovered.
Solution
Manually scale in and out the faulty node to prevent services from being affected.
Run the following commands:
# Access mgrtool. attach CCM ccm whoami # Check the CCM master. # Access the CCM master mgrtool. # Set a permanent fault. (Before removing a node from the cluster, shut down the Global Cache process of the node.) ccm set permanentFault # Restore the permanently faulty node to the cluster. (Ensure that its drive has no data. That is, the BDM is formatted.) ccm start failback # Start the scale-out. ccm start scaleout # Check the scale-out status. ccm show scaleout status
Parent topic: System Startup Abnormalities