Core Binding Optimization
Core binding optimization is recommended for addressing performance issues such as excessive cross-NUMA memory accesses, costly thread context switching, or delayed task scheduling by the host. The following core binding methods can effectively improve the running efficiency of the vLLM framework. You are advised to select a method based on the actual scenario.
You can run the lscpu command to view the CPU core group corresponding to each NUMA node.
Method 1: Fine-grained core binding (recommended)
This method binds critical tasks to a designated CPU core within the NUMA node, minimizing cross-core switching overhead.
export CPU_AFFINITY_CONF=2
Method 2: Coarse-grained core binding
vLLM automatically binds all tasks to the CPU cores within the NUMA node associated to the NPU. This prevents cross-NUMA memory access while allowing custom core binding under the coarse-grained configuration.
export CPU_AFFINITY_CONF=1
Method 3: Custom core binding
This method allows users to customize the core binding range of NPUs and bind processes running on some NPUs to cores.
For example, if four NPUs (npu0, npu1, npu2, and npu3) are available, the reference command is as follows:
export CPU_AFFINITY_CONF=1,npu0:0-1
- <value1>:<value2>-<value3> indicates that <value1> is bound to the CPU cores within the closed interval [<value2>,<value3>]. For example, npu0:0-1 indicates that processes running on NPU 0 is bound to CPU cores 0 and 1. The core binding policy of other NPUs is the same as that when CPU_AFFINITY_CONF is set to 1.
- This setting takes effect when CPU_AFFINITY_CONF is set to 1.