Other Optimizations
Other optimizations include OpenMP, BIOS, vLLM, and MindIE optimizations.
OpenMP Optimization
Optimizing OpenMP parallel computing through proper environment variable configuration can substantially enhance the performance of multi-threaded applications. Table 1 demonstrates that setting two key environment variables together enables coordinated optimization of thread resource allocation and workload balancing.
Environment Variable |
Description |
Recommended Value |
|---|---|---|
OMP_PROC_BIND |
Controls thread-core binding. Setting it to false enables dynamic load balancing through thread migration. |
false |
OMP_NUM_THREADS |
Defines the maximum parallel threads. Setting it to 100 allows up to 100 concurrent threads. |
100 |
Example:
export OMP_PROC_BIND=false export OMP_NUM_THREADS=100
vLLM Model Inference Optimization
Optimizing vLLM model inference performance by combining environment variables can substantially boost throughput and resource utilization. As described in Table 2, adding the optimization settings can significantly enhance inference performance.
Environment Variable |
Description |
Recommended Value |
|---|---|---|
VLLM_WORKER_MULTIPROC_METHOD |
Mode in which the vLLM framework creates a process. The V1 inference mode requires the spawn method to create a subprocess. |
spawn |
VLLM_USE_V1 |
Enables the V1 inference mode. When this is enabled, VLLM_WORKER_MULTIPROC_METHOD must also be set. |
1 |
VLLM_OPTIMIZATION_LEVEL |
Controls the optimization level of vLLM inference. Higher values enable more aggressive optimizations, potentially improving performance at the cost of increased memory usage. |
3 |
Example:
export VLLM_WORKER_MULTIPROC_METHOD=spawn export VLLM_USE_V1=1 export VLLM_OPTIMIZATION_LEVEL=3
MindIE Turbo Optimization
Certain performance optimization features in MindIE Turbo have specific usage constraints. These features are controlled via environment variables to allow flexible configuration based on application requirements.
Environment Variable |
Description |
Recommended Value |
|---|---|---|
USING_SAMPLING_TENSOR_CACHE |
Enables tensor caching for vLLM post-processing. This variable is not supported for chunked-prefill and beam search. |
|
USING_LCCL_COM |
Activates the LCCL communication library. This variable is not supported for cross-node communication. |
|
USING_PP_MATMUL |
Uses the optimized ping-pong matrix multiplication operator, particularly effective for long sequences. As different operators are used, the precision of vLLM Ascend may change after MindIE Turbo is added. If precision alignment with vLLM Ascend is required, disable this variable. |
|
Example:
export USING_SAMPLING_TENSOR_CACHE=1 export USING_LCCL_COM=1 export USING_PP_MATMUL=1
BIOS Optimization
The options shown in the following figure may affect the performance of models and other programs. The recommended options are listed in Table 4. Adjust the options as required.
Option |
Description |
Recommended Value |
|---|---|---|
Power Policy |
Balances system performance and power efficiency. |
Performance |
Support Smmu |
Enables system memory management for improved virtualization security and performance. |
Disabled |
CPU Prefetching Configuration |
Optimizes CPU data prefetching to accelerate processing. |
Disabled |

