vLLM-Ascend and MindIE Turbo Deployment
The deployment process involves installation of vLLM (LLM inference acceleration framework), vLLM-Ascend (seamless running of vLLM on the Ascend NPU), and MindIE Turbo (high-performance inference engine).
- Pull the vLLM repository and install the v0.7.3 version.
git clone -b v0.7.3 https://github.com/vllm-project/vllm.git cd vllm pip install -r requirements-build.txt VLLM_TARGET_DEVICE=empty pip install -v .
- Install vLLM-Ascend.
git clone -b v0.7.3-dev https://github.com/vllm-project/vllm-ascend.git cd vllm-ascend pip install -v .
- Contact Huawei engineers to obtain the MindIE Turbo software package.
- Decompress and install MindIE Turbo.
cd /home/packages tar -xvzf Ascend-mindie-turbo_2.0.RC2_py311_linux_aarch64.tar.gz cd Ascend-mindie-turbo_2.0.RC2_py311_linux_aarch64 pip install mindie_turbo-2.0rc2-cp311-cp311-linux_aarch64.whl
- Run the following command to check whether the installation is successful:
pip show mindie_turbo
If the following information is displayed, the installation is successful.
Version: 2.0rc2 Summary: MindIE Turbo: An LLM inference acceleration framework featuring extensive plugin collections optimized for Ascend devices. ...