Tuning Procedure
Obtain the MindIE image, obtain the model, and then quantize the model.
Creating an Environment
- Upload the image to the server to create a Docker image.
docker load -i mindie_2.0.RC1-B033-300I-Duo-py3.11-openeuler24.03-lts-aarch64.tar.gz

- Create a container based on the Docker image. (The value of docker_name can be customized, and the value of docker_image must be the image uploaded in 1.) Table 1 describes the command options.
docker run -itd --name docker_name --network host --shm-size 512g --privileged --device /dev/davinci0 --device /dev/davinci1 --device /dev/davinci2 --device /dev/davinci3 --device /dev/davinci4 --device /dev/davinci5 --device /dev/davinci6 --device /dev/davinci7 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc --volume /usr/local/dcmi:/usr/local/dcmi --volume /usr/local/bin/npu-smi:/usr/local/bin/npu-smi --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver --volume /usr/local/Ascend/firmware:/usr/local/Ascend/firmware --volume /etc/ascend_install.info:/etc/ascend_install.info --volume /home:/home docker_image /bin/bash
Table 1 Command options Option
Description
--name
Assigns a custom container name for easy container startup and stop.
--network host
Shares the network namespace of the host machine, exposing the host IP address and port (without the need for -p mapping).
-itd
Runs the container in detached mode while maintaining interactive terminal access.
--shm-size 512g
Allocates 512 GB of shared memory for the container.
--privileged
Enables privileged mode, granting full access to host devices and kernel capabilities.
--device
Maps host NPU devices to the container.
--volume
Mounts host directories into the container for easy access to host development environments.
docker_image
Name of the Docker image to deploy (replace with the actual image name).
/bin/bash
Default startup command (launches an interactive Bash terminal in this case).
Downloading and Quantizing the Model
- Enter the Docker container (replace docker_name with your custom container name).
docker exec -it docker_name bash
- Download the DeepSeek-R1-Distill-Llama-70B model (download all files in the link).
- Perform W8A8 quantization on the model.
- Prepare for quantization.
vim /usr/local/Ascend/atb-models/examples/convert/convert_utils.py
Change line 73 in convert_utils.py to is_exist_ok=True.
- Quantize the model.
source /usr/local/Ascend/ascend-toolkit/set_env.sh cd /usr/local/Ascend/atb-models/examples/models/llama bash examples/models/llama/generate_quant_weight.sh -src {floating-point weight path} -dst {W8A8 quantized weight path} -type llama2_13b_w8a8s -trust_remote_code - Go to {W8A8 quantized weight path} and replace the configuration file. For details about the content of the configuration file after replacement, see config.json.
- Prepare for quantization.