Inference Test with MX C500 Passthrough on cVM

Environment Requirements

Table 1 and Table 2 list the environment requirements.

**Table 1** Hardware requirements
Item	Description
CPU	New Kunpeng 920 processor model
GPU	MetaX C500

**Table 2** OS requirement
Item	Version
Host OS	openEuler 24.03 LTS SP1 openEuler 24.03 LTS SP2 openEuler 24.03 LTS SP3

Installing the Inference Environment

Obtain the VM image. (Currently, only the openEuler 24.03 LTS SP1 image is supported.)

wget https://repo.openeuler.org/openEuler-24.03-LTS-SP1/virtual_machine_img/aarch64/openEuler-24.03-LTS-SP1-aarch64.qcow2.xz
xz -d openEuler-24.03-LTS-SP1-aarch64.qcow2.xz

Install the MetaX GPU driver on the cVM.
1. For example, to install metax-driver-mxc500-2.32.0.6-rpm-aarch64.run locally, log in to the MetaX developer image resource center (account registration required).
2. In the Deployment Mode area, select On-premises. In the Resource Configuration Information area on the right, select MX C500 series in the Hardware drop-down list, select Linux/AArch64/Kylin V10 SP2 in the System and Compatibility drop-down list, and select 2.32.0.x (April 2025) in the Solution Version drop-down list. For details, see Figure 1.
3. In the Installation Steps area, obtain the corresponding installation command, and then run the command to complete the installation.
Figure 1 Installing the MetaX GPU driver on the cVM
Install the Docker inference environment on the cVM.
1. For example, to install vllm-metax:0.11.0-maca.ai3.3.0.11-torch2.6-py312-kylin2309a-arm64, log in to the MetaX developer image resource center (account registration required).
2. In the left area, select AI, and then choose the inference framework (vLLM [0.10.2 or later]). Select arm64 in the Architecture area, select kylin in the OS area, and select 3.12 in the Python Version area. For details, see Figure 2.
3. Click Copy docker pull to copy it, and then run the docker pull command to pull the Docker image.
Figure 2 Installing the Docker inference environment on the cVM

Using the vllm Docker Image for Inference Testing

Start the Docker image.

docker run -it --mount type=bind,source=/home,target=/workspace/mnt,readonly=true --device=/dev/mxcd --device=/dev/dri --group-add video e6ca53da420b /bin/bash

Perform a benchmark test in Docker.

vllm bench throughput --dataset /workspace/mnt/ShareGPT_V3_unfiltered_cleaned_split.json --model /workspace/mnt/Llama-3-8B-Instruct-HF/ -tp 1

The --mount type=bind, source=/home, target=/workspace/mnt, readonly=true command parameters bind and mount the directory in the VM to the specified path in the container. The directory specified by source=/home contains the model and data in the VM. (Download the model and data by yourself.) In this case, the data directory and model directory are /home/ShareGPT_V3_unfiltered_cleaned_split.json and /home/Llama-3-8B-Instruct-HF, respectively.
e6ca53da420b is the segment ID of the Docker image, which can be obtained by running the docker images command.

Parent topic: Best Practices (GPU Passthrough)