Environment Requirements
This section details the verified hardware and software environment suitable for the tuning object.
Hardware Requirements
Item |
Description |
|---|---|
Inference server |
Atlas 800I A2 inference server |
CPU |
4 × Kunpeng 920 7265/5250 |
NPU |
8 × Ascend 910B4 |
OS Requirements
Item |
Version |
Description |
|---|---|---|
Ubuntu |
24.04 or 22.04 |
Verified Linux distribution. |
openEuler |
22.03 LTS SP4 |
Verified Linux distribution, with some optimization options depending on version 22.03 LTS SP4. |
Software Requirements
Item |
Version |
Description |
Download URL |
|---|---|---|---|
Driver and firmware |
24.0 or later |
NPU driver and firmware. |
Contact Huawei technical support. |
CANN |
8.1.RC1 |
Compute Architecture for Neural Networks (CANN) is Ascend's heterogeneous computing architecture designed for AI applications, delivering a powerful, highly adaptable, and customizable framework for AI acceleration. |
Contact Huawei technical support. |
Ascend Extension for PyTorch (torch_npu plugin) |
2.5.1.dev20250320 or later |
WHL package of the torch_npu plugin, which enables the Ascend NPU to support the PyTorch framework. |
|
2.5.1 or later |
WHL package of the PyTorch framework |
||
vLLM |
0.7.3 |
Inference acceleration framework for large language models (LLMs). |
|
vLLM-Ascend |
0.7.3 |
Enables seamless running of vLLM on Ascend NPUs. |
|
MindIE Turbo |
2.1.RC2 |
MindIE Turbo is Huawei's acceleration plugin library for LLM inference engines on Ascend hardware, featuring proprietary optimization algorithms for LLMs and framework-level enhancements. It offers modular and plugin interfaces to seamlessly integrate and accelerate third-party inference engines. |
Contact Huawei technical support. |
Python |
3.10.x/3.11.x |
Python is an interpreted, object-oriented programming language. |
Installing a specified version using conda |
Docker |
24.x.x or later |
Docker is a set of platform as a service (PaaS) products. Based on the OS-level virtualization technology, it packs software and its dependencies into a container. |
|
GCC |
10.3.1 or later |
GNU Compiler Collection (GCC) is a programming language compiler developed by GNU. It includes compilers for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (such as libstdc++ and libgcj). |
Install it using yum or apt. |
msModelSlim |
- |
msModelSlim is an Ascend-affinity model compression tool with acceleration as the goal, compression as the technology, and Ascend as the foundation. It supports training and inference acceleration with functions such as low-rank model factorization, sparse training, post-training quantization (PTQ), and quantization aware training (QAT). Ascend AI model developers can flexibly call Python APIs to tune model performance and export models in different formats for execution on Ascend AI Processors. |
|
DeepSeek-R1-Distill-Llama-70B |
- |
LLM file to run. |
The installation process for the Ascend NPU firmware/driver, CANN, and related software packages requires downloading various resources online, such as Python source code, compilation tools, and dependencies. Since this cannot be done offline, make sure the environment can access the Internet.