Rate This Document
Findability
Accuracy
Completeness
Readability

AI Inference Parameters

TensorFlow Serving

For details about the test, see Inference Performance Benchmark Testing for Search and Recommendation Ranking Models. Bind the TensorFlow Serving instance to NUMA node 0 and the Perf Analyzer tool to NUMA node 1 to evaluate the performance of search and recommendation models in the Model Zoo.

  • Service parameters

    It is recommended to set tensorflow_intra_op_parallelism to the number of CPU cores available for TensorFlow Serving. This makes full use of multi-core resources and prevents resource contention, improving the overall inference system performance.

    Table 1 TensorFlow Serving service parameters

    Recommended Parameter

    Tuning Analysis

    Parameter Name

    Parameter Description

    Value Range

    Test Case

    Importance Ratio (%)

    Optimal Value (vs. Baseline)

    Worst Value (vs. Baseline)

    tensorflow_intra_op_parallelism

    Number of threads that perform independent operations concurrently

    [0, 128]

    Wide and Deep, DSSM, DFFM

    88.7%

    32 (+16.2%)

    122 (-8.9%)

    max_batch_size

    Sub parameter of batching_parameters_file

    Maximum number of requests that can be accepted at a time (batch size)

    [32, 1024]

    Wide and Deep

    -

    190 (+7.0%)

    681 (-55.5%)

  • System parameters

    In a search and recommendation scenario, adjust the kernel scheduling subsystem parameters (see Table 2) and set transparent_hugepage_mode to always. This can optimize memory management and increase the overall system throughput.

    Table 2 TensorFlow Serving system parameters

    Recommended Parameter

    Tuning Analysis

    Parameter Name

    Value Range

    Test Case

    Importance Ratio (%)

    Default Value

    Optimal Value (vs. Baseline)

    Worst Value (vs. Baseline)

    kernel.sched_cluster

    {0, 1}

    Wide && Deep

    42.5%

    0

    1 (+2.4%)

    0 (-4.7%)

    kernel.sched_migration_cost_ns

    [100000, 5000000]

    20.7%

    500000

    1319951 (+2.4%)

    168578 (-4.7%)

    kernel.sched_nr_migrate

    [1, 128]

    DFFM

    10.2%

    32

    128 (+1.2%)

    75 (-0.6%)

    kernel.sched_child_runs_first

    {0, 1}

    9.3%

    0

    0 (+1.2%)

    1 (-0.6%)

    transparent_hugepage_mode

    {madvise, never, always}

    DSSM

    42.7%

    never

    always (+1.5%)

    madvise (-6.7%)