OmniAdvisor Configuration File

common_config.cfg

**Table 1** Configuration items of **common_config.cfg**
Module	Parameter	Default Value	Description
workload	workload_name	tpcds_bin_partitioned_decimal_orc_3000	Name of the tested database.
workload	log_analyzer_path	/opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64	Path for storing the decompressed log parsing module.
database	db_name	test	Name of the MySQL database. If the name does not exist, it will be automatically created.
	db_host	localhost	Name for the host connecting to the MySQL database.
	db_port	3306	Port for connecting to the MySQL database.
sampling	sampling_epochs	40	Number of parameter sampling rounds.
recommend	recommend_identifier	application_name	After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters.
spark	log_start_time	-	Start time of Spark run logs. You can view the date on the Hadoop UI.
	log_end_time	-	End time of Spark run logs.
	enable_sampling_all_sql	true	Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid.
	history_application_name	q12	Name of the task for which sampling tuning is to be performed. For example, q12 indicates that sampling tuning is performed for this task (the prerequisite is that enable_sampling_all_sql is set to false).
	total_memory_threshold	1200	Maximum memory usage of Spark during sampling, in GB.
	spark_default_config	--conf spark.sql.orc.impl=native --conf spark.locality.wait=0 --conf spark.sql.broadcastTimeout=300	Default Spark parameter. Generally, the default parameter is not involved in parameter sampling.
hive	log_start_time	-	Start time of Tez run logs. You can view the date on the Hadoop UI.
	log_end_time	-	End time of Tez run logs.
	enable_sampling_all_sql	true	Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid.
	history_application_name	query12	Name of the task for which sampling tuning is to be performed. For example, query12 indicates that sampling tuning is performed for this task (the prerequisite is that enable_sampling_all_sql is set to false).
	hive_default_config	--hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true --hiveconf hive.exec.compress.intermediate=true	Default Hive parameter. Generally, the default parameter is not involved in parameter sampling.

omniAdvisorLogAnalyzer.properties

**Table 2** Configuration items of **omniAdvisorLogAnalyzer.properties**
Module	Item	Default Value	Description
log analyzer	log.analyzer.thread.count	3	Number of concurrent log parsing processes, that is, number of concurrent analysis tasks.
kerberos	kerberos.principal	-	User used for Kerberos authentication in secure mode.
kerberos	kerberos.keytab.file	-	Keytab file path used for Kerberos authentication in secure mode.
datasource	datasource.db.driver	com.mysql.cj.jdbc.Driver	Driver of the database used to save the analysis result after log analysis.
datasource	datasource.db.url	-	URL of the database used to save the analysis result after log analysis.
spark fetcher	spark.enable	false	Indicates whether to enable Spark Fetcher.
	spark.workload	default	Database of the Spark task.
	spark.eventLogs.mode	-	Spark Fetcher mode, which can be log or rest.
	spark.timeout.seconds	30	Timeout duration of a Spark Fetcher analysis task, in seconds.
	spark.rest.url	http://localhost:18080	URL of the Spark history server, which is used only in rest mode.
	spark.log.directory	-	Directory for storing Spark logs, which is used only in log mode.
	spark.log.maxSize.mb	500	Maximum size of a Spark analysis log file, in MB. If the size of a log file exceeds the maximum size, the log file will be ignored. This parameter is used only in log mode.
tez fetcher	tez.enable	false	Indicates whether to enable Tez Fetcher.
	tez.workload	default	Database of the Tez task.
	tez.timline.url	http://localhost:8188	URL of the timeline server.
	tez.timeline.timeout.ms	6000	Timeout duration of accessing the timeline server, in milliseconds.

Database Tables

**Table 3** Fields in the yarn_app_result, **best_config**, and **sampling_config** tables
Table	Field	Description
yarn_app_result	application_id	Application ID of the task executed on Yarn.
	application_name	Application name of the task executed on Yarn. Generally, it is the identifier of the same task.
	application_workload	Name of the database used to perform the task. Generally, it is specified by --database.
	start_time	Start time of the task.
	finish_time	End time of the task.
	job_type	Task type, which can be Spark or Tez.
	duration_time	Time used to execute the task, in milliseconds.
	parameters	Parameters used to execute the task.
	execution_status	Task execution status. 0: failed; 1: succeeded.
	query	Query statement used to execute the task.
best_config	application_id	Application ID of the task executed on Yarn.
	application_name	Application name of the task executed on Yarn. Generally, it is the identifier of the same task.
	application_workload	Name of the database used to perform the task. Generally, it is specified by --database.
	duration_time	Time used to execute the task, in milliseconds.
	parameters	Parameters used to execute the task.
	job_type	Task type, which can be Spark or Tez.
	query_hash	Hash value obtained by performing SHA256 calculation on the SQL statement that executes the task.
sampling_config	application_id	Application ID of the task executed on Yarn.
	application_name	Application name of the task executed on Yarn. Generally, it is the identifier of the same task.
	application_workload	Name of the database used to perform the task. Generally, it is specified by --database.
	duration_time	Time used to execute the task, in milliseconds.
	parameters	Parameters used to execute the task.
	job_type	Task type, which can be Spark or Tez.
	execution_status	Task execution status. 0: failed; 1: succeeded.

Parent topic: Reference