OmniAdvisor Configuration File

common_config.cfg

**Table 1** Configuration items of **common_config.cfg**
Module	Parameter	Default Value	Description
workload	workload_name	tpcds_bin_partitioned_decimal_orc_2	Name of the tested database.
	log_analyzer_path	/opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64	Path for storing the decompressed log parsing module.
	identification_type	job_hash	Unique ID of a task. If the value is application_name, the hash value of the task name is used to search for the optimal parameters of the task in the database. If the default value is job_hash, the hash value of the task query (Spark or Hive SQL task) or application-jar package (Spark application task) is used to search for the optimal parameters.
database	db_name	test	Name of the MySQL database. If the name does not exist, it will be automatically created.
	db_host	localhost	Name for the host connecting to the MySQL database.
	db_port	3306	Port for connecting to the MySQL database.
spark	log_start_time	-	Start time of Spark run logs. You can view the date on the Hadoop UI.
	log_end_time	-	End time of Spark run logs.
	spark_default_config	--conf spark.sql.orc.impl=native --conf spark.locality.wait=0 --conf spark.sql.broadcastTimeout=300	Default Spark parameter. Generally, the default parameter is not involved in parameter sampling.
hive	log_start_time	-	Start time of Tez run logs. You can view the date on the Hadoop UI.
	log_end_time	-	End time of Tez run logs.
	hive_default_config	--hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true	Default Hive parameter. Generally, the default parameter is not involved in parameter sampling.

omniAdvisorLogAnalyzer.properties

**Table 2** Configuration items of **omniAdvisorLogAnalyzer.properties**
Module	Item	Default Value	Description
log analyzer	log.analyzer.thread.count	3	Number of concurrent log parsing processes, that is, number of concurrent analysis tasks.
kerberos	kerberos.principal	-	User used for Kerberos authentication in secure mode.
kerberos	kerberos.keytab.file	-	Keytab file path used for Kerberos authentication in secure mode.
datasource	datasource.db.driver	com.mysql.cj.jdbc.Driver	Driver of the database used to save the analysis result after log analysis.
datasource	datasource.db.url	-	URL of the database used to save the analysis result after log analysis.
spark fetcher	spark.enable	false	Indicates whether to enable Spark Fetcher.
	spark.workload	default	Database of the Spark task.
	spark.eventLogs.mode	-	Spark Fetcher mode, which can be log or rest.
	spark.timeout.seconds	30	Timeout duration of a Spark Fetcher analysis task, in seconds.
	spark.rest.url	http://localhost:18080	URL of the Spark history server, which is used only in rest mode.
	spark.log.directory	-	Directory for storing Spark logs, which is used only in log mode.
	spark.log.maxSize.mb	500	Maximum size of a Spark analysis log file, in MB. If the size of a log file exceeds the maximum size, the log file will be ignored. This parameter is used only in log mode.
tez fetcher	tez.enable	false	Indicates whether to enable Tez Fetcher.
	tez.workload	default	Database of the Tez task.
	tez.timline.url	http://localhost:8188	URL of the timeline server.
	tez.timeline.timeout.ms	6000	Timeout duration of accessing the timeline server, in milliseconds.

Database Tables

**Table 3** Fields in the **history_config** and **best_config** tables
Table	Field	Description
history_config	application_id	Application ID of the task executed on Yarn.
	application_name	Application name of the task executed on Yarn. Generally, it is the identifier of the same task.
	application_workload	Name of the database used to perform the SQL task. Generally, it is specified by --database.
	start_time	Start time of the task.
	finish_time	End time of the task.
	duration_time	Time used to execute the task, in milliseconds.
	job_type	Task type, which can be Spark or Tez.
	submit_method	Method of submitting a Spark task. spark-sql indicates that the task is an SQL task submitted by spark-sql, and spark-submit indicates that the task is an application task submitted by spark-submit.
	deploy_mode	Deployment mode used for a Spark task. client indicates the Yarn client mode and cluster indicates the Yarn cluster mode.
	submit_cmd	Command for submitting a Spark task.
	parameters	Parameters used to execute the task.
	execution_status	Task execution status. 0: failed; 1: succeeded.
	query	Query statement used to execute an SQL task.
	identification	Unique ID of the task.
best_config	application_id	Application ID of the task executed on Yarn.
	application_name	Application name of the task executed on Yarn. Generally, it is the identifier of the same task.
	application_workload	Name of the database used to perform the task. Generally, it is specified by --database.
	duration_time	Time used to execute the task, in milliseconds.
	parameters	Parameters used to execute the task.
	submit_cmd	Command for submitting a Spark task.
	job_type	Task type, which can be Spark or Tez.
	query_hash	Hash value obtained by performing SHA256 calculation on the SQL statement that executes the task.

Parent topic: References