我要评分
获取效率
正确性
完整性
易理解

Other Optimizations

Other optimizations include OpenMP, BIOS, vLLM, and MindIE optimizations.

OpenMP Optimization

Optimizing OpenMP parallel computing through proper environment variable configuration can substantially enhance the performance of multi-threaded applications. Table 1 demonstrates that setting two key environment variables together enables coordinated optimization of thread resource allocation and workload balancing.

Table 1 OpenMP optimization

Environment Variable

Description

Recommended Value

OMP_PROC_BIND

Controls thread-core binding. Setting it to false enables dynamic load balancing through thread migration.

false

OMP_NUM_THREADS

Defines the maximum parallel threads. Setting it to 100 allows up to 100 concurrent threads.

100

Example:

export OMP_PROC_BIND=false
export OMP_NUM_THREADS=100

vLLM Model Inference Optimization

Optimizing vLLM model inference performance by combining environment variables can substantially boost throughput and resource utilization. As described in Table 2, adding the optimization settings can significantly enhance inference performance.

Table 2 vLLM model inference optimization

Environment Variable

Description

Recommended Value

VLLM_WORKER_MULTIPROC_METHOD

Mode in which the vLLM framework creates a process. The V1 inference mode requires the spawn method to create a subprocess.

spawn

VLLM_USE_V1

Enables the V1 inference mode. When this is enabled, VLLM_WORKER_MULTIPROC_METHOD must also be set.

1

VLLM_OPTIMIZATION_LEVEL

Controls the optimization level of vLLM inference. Higher values enable more aggressive optimizations, potentially improving performance at the cost of increased memory usage.

3

Example:

export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_USE_V1=1
export VLLM_OPTIMIZATION_LEVEL=3

MindIE Turbo Optimization

Certain performance optimization features in MindIE Turbo have specific usage constraints. These features are controlled via environment variables to allow flexible configuration based on application requirements.

Table 3 Mindie Turbo optimization

Environment Variable

Description

Recommended Value

USING_SAMPLING_TENSOR_CACHE

Enables tensor caching for vLLM post-processing.

This variable is not supported for chunked-prefill and beam search.

  • Enable (1) this variable for greedy, topk, and topp sampling to boost performance.
  • Disable this variable for chunked-prefill and beam search.

USING_LCCL_COM

Activates the LCCL communication library.

This variable is not supported for cross-node communication.

  • Enable (1) this variable for single-node systems to boost performance.
  • Disable (0) this variable for multi-node scenarios.

USING_PP_MATMUL

Uses the optimized ping-pong matrix multiplication operator, particularly effective for long sequences. As different operators are used, the precision of vLLM Ascend may change after MindIE Turbo is added. If precision alignment with vLLM Ascend is required, disable this variable.

  • Disable (0) this variable when strict precision consistency is required.
  • In other scenarios, enable (1) this variable for better performance.

Example:

export USING_SAMPLING_TENSOR_CACHE=1
export USING_LCCL_COM=1
export USING_PP_MATMUL=1

BIOS Optimization

The options shown in the following figure may affect the performance of models and other programs. The recommended options are listed in Table 4. Adjust the options as required.

Table 4 BIOS optimization

Option

Description

Recommended Value

Power Policy

Balances system performance and power efficiency.

Performance

Support Smmu

Enables system memory management for improved virtualization security and performance.

Disabled

CPU Prefetching Configuration

Optimizes CPU data prefetching to accelerate processing.

Disabled