我要评分
获取效率
正确性
完整性
易理解

vLLM-Ascend and MindIE Turbo Deployment

The deployment process involves installation of vLLM (LLM inference acceleration framework), vLLM-Ascend (seamless running of vLLM on the Ascend NPU), and MindIE Turbo (high-performance inference engine).

  1. Pull the vLLM repository and install the v0.7.3 version.
    git clone -b v0.7.3 https://github.com/vllm-project/vllm.git
    cd vllm
    pip install -r requirements-build.txt
    VLLM_TARGET_DEVICE=empty pip install -v .
  2. Install vLLM-Ascend.
    git clone -b v0.7.3-dev https://github.com/vllm-project/vllm-ascend.git
    cd vllm-ascend
    pip install -v .
  3. Contact Huawei engineers to obtain the MindIE Turbo software package.

  4. Decompress and install MindIE Turbo.
    cd /home/packages
    tar -xvzf Ascend-mindie-turbo_2.0.RC2_py311_linux_aarch64.tar.gz
    cd Ascend-mindie-turbo_2.0.RC2_py311_linux_aarch64
    pip install mindie_turbo-2.0rc2-cp311-cp311-linux_aarch64.whl
  5. Run the following command to check whether the installation is successful:
    pip show mindie_turbo

    If the following information is displayed, the installation is successful.

    Version: 2.0rc2
    Summary: MindIE Turbo: An LLM inference acceleration framework featuring extensive plugin collections optimized for Ascend devices.
    ...