Introduction
This document details the deployment and activation of the VM lockup detection feature on Kunpeng servers using the openEuler OS.
A robust lockup detection mechanism is critical in virtualized environments to prevent VMs from stalling in infinite loops while maintaining system manageability. By leveraging non-maskable interrupts (NMIs), this mechanism monitors interrupt responses in real time to detect lockups within VMs, ensuring recovery from unresponsive states caused by lockups.
Linux systems traditionally rely on watchdog mechanisms for lockup detection, which use timer interrupts to identify system hangs. However, timer interrupts may be blocked during certain execution phases (such as interrupt handling or atomic contexts) limiting effectiveness of the watchdog. The NMI watchdog overcomes this limitation by utilizing NMIs, which remain functional even in atomic contexts, providing more reliable detection of system hangs.

Principles
The NMI Watchdog serves as a dedicated mechanism for identifying hard lockups in Linux systems. It monitors kernel responsiveness by triggering NMI interrupts and verifying their processing.
openEuler offers two NMI watchdog implementations for AArch64 platforms:
- SDEI watchdog (default)
Utilizing the Software Delegated Exception Interface (SDEI) of AArch64, this solution registers callbacks in non-secure environments to handle system events. The SDEI watchdog operates as an NMI watchdog variant within openEuler.
- PMC (PMU) watchdog
This alternative employs Pseudo-NMI technology, configuring Performance Monitoring Interrupts (PMIs) to simulate NMI behavior. By disabling the SDEI watchdog, it ensures high-priority NMI interrupts in VMs. Known as Performance Monitoring Unit (PMU) watchdog, it logs errors and initiates system resets when hard lockups occur.
For AArch64 systems:
- openEuler defaults to the SDEI watchdog, but if this mechanism cannot initialize in virtualized environments, the system does not automatically fall back to the NMI watchdog based on Performance Monitoring Counter (PMC) or PMU. You need to manually disable the SDEI watchdog through kernel parameters.
- To enable the PMC/PMU watchdog, explicitly disable the SDEI watchdog by including disable_sdei_nmi_watchdog in the boot parameters. Full parameter details are available in Activation.