我要评分
获取效率
正确性
完整性
易理解

Tuning Result

Create a test script to test the model tuning result.

  1. Go to the created Docker container and create test script run_performance_4rank.sh.
  2. For details about the run_performance_4rank.sh script content, see run_performance_4rank.sh.
  3. Run the test script. Replace the model path in the last line of the test script with the actual one.

    Table 1 describes the parameters in the last line of the script.

    Table 1 Parameters in the last line of the script

    Parameter

    Description

    pa_fp16

    Uses pa_fp16 mixed precision.

    performance

    Uses the performance test mode.

    $ALL_IN_OUT_SETS

    Sets the input and output token sizes.

    $BS_GROUP

    Sets the batch size.

    $P_MAX_BS

    Sets the prefill batch size.

    llama

    Uses the LLaMA model.

    /xx/xx/models/DeepSeek-R1-Distill-Llama-70B-W8A8SC-full

    Specifies the weight path.

    4

    Sets the number of ranks to 4.

    [1,4,-1,-1,-1,-1]

    Uses 1dp4tp.

    The test result is as follows (decoder_token_time indicates the time taken to generate a token):

    Decode TPS (Token Per Second) indicates the number of tokens that can be generated per second in the decoding phase. The calculation formula is: Decode TPS = 1/decode_token_time × 1000 × 8.