我要评分
获取效率
正确性
完整性
易理解

Tuning Procedure

Obtain the MindIE image, obtain the model, and then quantize the model.

Downloading the Image

Obtain the MindIE image by referring to MindIE Software Installation Guide.

Creating an Environment

  1. Upload the image to the server to create a Docker image.
    docker load -i mindie_2.0.RC1-B033-300I-Duo-py3.11-openeuler24.03-lts-aarch64.tar.gz

  2. Create a container based on the Docker image. (The value of docker_name can be customized, and the value of docker_image must be the image uploaded in 1.) Table 1 describes the command options.
    docker run -itd --name docker_name --network host --shm-size 512g --privileged --device /dev/davinci0 --device /dev/davinci1 --device /dev/davinci2 --device /dev/davinci3 --device /dev/davinci4 --device /dev/davinci5 --device /dev/davinci6 --device /dev/davinci7 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc --volume /usr/local/dcmi:/usr/local/dcmi --volume /usr/local/bin/npu-smi:/usr/local/bin/npu-smi --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver --volume /usr/local/Ascend/firmware:/usr/local/Ascend/firmware --volume /etc/ascend_install.info:/etc/ascend_install.info --volume /home:/home docker_image /bin/bash
    Table 1 Command options

    Option

    Description

    --name

    Assigns a custom container name for easy container startup and stop.

    --network host

    Shares the network namespace of the host machine, exposing the host IP address and port (without the need for -p mapping).

    -itd

    Runs the container in detached mode while maintaining interactive terminal access.

    --shm-size 512g

    Allocates 512 GB of shared memory for the container.

    --privileged

    Enables privileged mode, granting full access to host devices and kernel capabilities.

    --device

    Maps host NPU devices to the container.

    --volume

    Mounts host directories into the container for easy access to host development environments.

    docker_image

    Name of the Docker image to deploy (replace with the actual image name).

    /bin/bash

    Default startup command (launches an interactive Bash terminal in this case).

Downloading and Quantizing the Model

  1. Enter the Docker container (replace docker_name with your custom container name).
    docker exec -it docker_name bash
  2. Download the DeepSeek-R1-Distill-Llama-70B model (download all files in the link).
  3. Perform W8A8 quantization on the model.
    1. Prepare for quantization.
      vim /usr/local/Ascend/atb-models/examples/convert/convert_utils.py

      Change line 73 in convert_utils.py to is_exist_ok=True.

    2. Quantize the model.
      source /usr/local/Ascend/ascend-toolkit/set_env.sh
      cd /usr/local/Ascend/atb-models/examples/models/llama
      bash examples/models/llama/generate_quant_weight.sh -src {floating-point weight path} -dst {W8A8 quantized weight path} -type llama2_13b_w8a8s -trust_remote_code
    3. Go to {W8A8 quantized weight path} and replace the configuration file. For details about the content of the configuration file after replacement, see config.json.