CMF Introduction
This document explains the concept and architecture of the Cache Management Framework (CMF) and provides instructions on how to compile, install, and use it.
CMF is developed based on the Kunpeng hardware platform. It consists of a kernel-mode driver and a command line tool. The kernel-mode driver interacts with the command line tool through ioctl. The command line tool parses user command line parameters and sends the parsed parameters to the kernel-mode driver through the user-mode API. Then, the driver verifies the validity of the parameters. After the parameters pass the validity check, the hardware access module queries and sets hardware registers.
CMF modifies hardware registers to control the allocation of system resources such as the L2 cache and L3 cache.
Architecture

Application Scenarios
Ascend 800I A3 inference servers running on the new Kunpeng 920 processor model may experience slow operator delivery, leading to a high NPU idle rate and a long decoding latency. CMF modifies hardware registers to adjust the L2 cache allocation in the system. This reduces the decoding latency of the Qwen2 1.5B model by 7%.
In addition, CMF provides APIs that allow external applications to interact with the driver to view and modify the hardware resource allocation policies.
Constraints
- This function is supported only on physical machines.
- The driver must be loaded by the root user.
- The L2 I-cache and D-cache must be configured with 2-way or greater associativity.
- Power status switching operations are not allowed on driver-occupied cores, as they can cause a reset. These operations include restarting, enabling Local Peripheral Interrupts (LPIs), and activating low-power features in non-high-performance mode.
- Hardware registers can be read and written through the command line tool only after the kernel-mode driver is successfully loaded.