Rate This Document
Findability
Accuracy
Completeness
Readability

Environment Requirements

This section details the verified hardware and software environment suitable for the tuning object.

Hardware Requirements

Table 1 Hardware requirements

Item

Description

Inference server

Atlas 800I A2 inference server

CPU

4 × Kunpeng 920 7265/5250

NPU

8 × Ascend 910B4

OS Requirements

Table 2 OS requirements

Item

Version

Description

Ubuntu

24.04 or 22.04

Verified Linux distribution.

openEuler

22.03 LTS SP4

Verified Linux distribution, with some optimization options depending on version 22.03 LTS SP4.

Software Requirements

Table 3 Software requirements

Item

Version

Description

Download URL

Driver and firmware

24.0 or later

NPU driver and firmware.

Contact Huawei technical support.

CANN

8.1.RC1

Compute Architecture for Neural Networks (CANN) is Ascend's heterogeneous computing architecture designed for AI applications, delivering a powerful, highly adaptable, and customizable framework for AI acceleration.

Contact Huawei technical support.

Ascend Extension for PyTorch (torch_npu plugin)

2.5.1.dev20250320 or later

WHL package of the torch_npu plugin, which enables the Ascend NPU to support the PyTorch framework.

Link

2.5.1 or later

WHL package of the PyTorch framework

Link

vLLM

0.7.3

Inference acceleration framework for large language models (LLMs).

Link

vLLM-Ascend

0.7.3

Enables seamless running of vLLM on Ascend NPUs.

Link

MindIE Turbo

2.1.RC2

MindIE Turbo is Huawei's acceleration plugin library for LLM inference engines on Ascend hardware, featuring proprietary optimization algorithms for LLMs and framework-level enhancements. It offers modular and plugin interfaces to seamlessly integrate and accelerate third-party inference engines.

Contact Huawei technical support.

Python

3.10.x/3.11.x

Python is an interpreted, object-oriented programming language.

Installing a specified version using conda

Docker

24.x.x or later

Docker is a set of platform as a service (PaaS) products. Based on the OS-level virtualization technology, it packs software and its dependencies into a container.

Link

GCC

10.3.1 or later

GNU Compiler Collection (GCC) is a programming language compiler developed by GNU. It includes compilers for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (such as libstdc++ and libgcj).

Install it using yum or apt.

msModelSlim

-

msModelSlim is an Ascend-affinity model compression tool with acceleration as the goal, compression as the technology, and Ascend as the foundation. It supports training and inference acceleration with functions such as low-rank model factorization, sparse training, post-training quantization (PTQ), and quantization aware training (QAT). Ascend AI model developers can flexibly call Python APIs to tune model performance and export models in different formats for execution on Ascend AI Processors.

Link

DeepSeek-R1-Distill-Llama-70B

-

LLM file to run.

Link

The installation process for the Ascend NPU firmware/driver, CANN, and related software packages requires downloading various resources online, such as Python source code, compilation tools, and dependencies. Since this cannot be done offline, make sure the environment can access the Internet.