Tuning Result
Create a test script to test the model tuning result.
- Go to the created Docker container and create test script run_performance_4rank.sh.
- For details about the run_performance_4rank.sh script content, see run_performance_4rank.sh.
- Run the test script. Replace the model path in the last line of the test script with the actual one.

Table 1 describes the parameters in the last line of the script.
Table 1 Parameters in the last line of the script Parameter
Description
pa_fp16
Uses pa_fp16 mixed precision.
performance
Uses the performance test mode.
$ALL_IN_OUT_SETS
Sets the input and output token sizes.
$BS_GROUP
Sets the batch size.
$P_MAX_BS
Sets the prefill batch size.
llama
Uses the LLaMA model.
/xx/xx/models/DeepSeek-R1-Distill-Llama-70B-W8A8SC-full
Specifies the weight path.
4
Sets the number of ranks to 4.
[1,4,-1,-1,-1,-1]
Uses 1dp4tp.
The test result is as follows (decoder_token_time indicates the time taken to generate a token):

Decode TPS (Token Per Second) indicates the number of tokens that can be generated per second in the decoding phase. The calculation formula is: Decode TPS = 1/decode_token_time × 1000 × 8.