AI Inference Parameters
TensorFlow Serving
For details about the test, see Inference Performance Benchmark Testing for Search and Recommendation Ranking Models. Bind the TensorFlow Serving instance to NUMA node 0 and the Perf Analyzer tool to NUMA node 1 to evaluate the performance of search and recommendation models in the Model Zoo.
- Service parameters
It is recommended to set tensorflow_intra_op_parallelism to the number of CPU cores available for TensorFlow Serving. This makes full use of multi-core resources and prevents resource contention, improving the overall inference system performance.
Table 1 TensorFlow Serving service parameters Recommended Parameter
Tuning Analysis
Parameter Name
Parameter Description
Value Range
Test Case
Importance Ratio (%)
Optimal Value (vs. Baseline)
Worst Value (vs. Baseline)
tensorflow_intra_op_parallelism
Number of threads that perform independent operations concurrently
[0, 128]
Wide and Deep, DSSM, DFFM
88.7%
32 (+16.2%)
122 (-8.9%)
max_batch_size
Sub parameter of batching_parameters_file
Maximum number of requests that can be accepted at a time (batch size)
[32, 1024]
Wide and Deep
-
190 (+7.0%)
681 (-55.5%)
- System parameters
In a search and recommendation scenario, adjust the kernel scheduling subsystem parameters (see Table 2) and set transparent_hugepage_mode to always. This can optimize memory management and increase the overall system throughput.
Table 2 TensorFlow Serving system parameters Recommended Parameter
Tuning Analysis
Parameter Name
Value Range
Test Case
Importance Ratio (%)
Default Value
Optimal Value (vs. Baseline)
Worst Value (vs. Baseline)
kernel.sched_cluster
{0, 1}
Wide && Deep
42.5%
0
1 (+2.4%)
0 (-4.7%)
kernel.sched_migration_cost_ns
[100000, 5000000]
20.7%
500000
1319951 (+2.4%)
168578 (-4.7%)
kernel.sched_nr_migrate
[1, 128]
DFFM
10.2%
32
128 (+1.2%)
75 (-0.6%)
kernel.sched_child_runs_first
{0, 1}
9.3%
0
0 (+1.2%)
1 (-0.6%)
transparent_hugepage_mode
{madvise, never, always}
DSSM
42.7%
never
always (+1.5%)
madvise (-6.7%)