AI Inference Parameters

TensorFlow Serving

For details about the test, see Inference Performance Benchmark Testing for Search and Recommendation Ranking Models. Bind the TensorFlow Serving instance to NUMA node 0 and the Perf Analyzer tool to NUMA node 1 to evaluate the performance of search and recommendation models in the Model Zoo.

Service parameters

It is recommended to set tensorflow_intra_op_parallelism to the number of CPU cores available for TensorFlow Serving. This makes full use of multi-core resources and prevents resource contention, improving the overall inference system performance.

**Table 1** TensorFlow Serving service parameters
Recommended Parameter			Tuning Analysis
Parameter Name	Parameter Description	Value Range	Test Case	Importance Ratio (%)	Optimal Value (vs. Baseline)	Worst Value (vs. Baseline)
tensorflow_intra_op_parallelism	Number of threads that perform independent operations concurrently	[0, 128]	Wide and Deep, DSSM, DFFM	88.7%	32 (+16.2%)	122 (-8.9%)
max_batch_size	Sub parameter of batching_parameters_file Maximum number of requests that can be accepted at a time (batch size)	[32, 1024]	Wide and Deep	-	190 (+7.0%)	681 (-55.5%)

System parameters

In a search and recommendation scenario, adjust the kernel scheduling subsystem parameters (see Table 2) and set transparent_hugepage_mode to always. This can optimize memory management and increase the overall system throughput.

**Table 2** TensorFlow Serving system parameters
Recommended Parameter		Tuning Analysis
Parameter Name	Value Range	Test Case	Importance Ratio (%)	Default Value	Optimal Value (vs. Baseline)	Worst Value (vs. Baseline)
kernel.sched_cluster	{0, 1}	Wide && Deep	42.5%	0	1 (+2.4%)	0 (-4.7%)
kernel.sched_migration_cost_ns	[100000, 5000000]	Wide && Deep	20.7%	500000	1319951 (+2.4%)	168578 (-4.7%)
kernel.sched_nr_migrate	[1, 128]	DFFM	10.2%	32	128 (+1.2%)	75 (-0.6%)
kernel.sched_child_runs_first	{0, 1}	DFFM	9.3%	0	0 (+1.2%)	1 (-0.6%)
transparent_hugepage_mode	{madvise, never, always}	DSSM	42.7%	never	always (+1.5%)	madvise (-6.7%)

Parent topic: Parameter Recommendation and Tuning