Rate This Document
Findability
Accuracy
Completeness
Readability

OmniAdvisor Configuration File

common_config.cfg

Table 1 Configuration items of common_config.cfg

Module

Parameter

Default Value

Description

workload

workload_name

tpcds_bin_partitioned_decimal_orc_3000

Name of the tested database.

log_analyzer_path

/opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64

Path for storing the decompressed log parsing module.

database

db_name

test

Name of the MySQL database. If the name does not exist, it will be automatically created.

db_host

localhost

Name for the host connecting to the MySQL database.

db_port

3306

Port for connecting to the MySQL database.

sampling

sampling_epochs

40

Number of parameter sampling rounds.

recommend

recommend_identifier

application_name

After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters.

spark

log_start_time

-

Start time of Spark run logs. You can view the date on the Hadoop UI.

log_end_time

-

End time of Spark run logs.

enable_sampling_all_sql

true

Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid.

history_application_name

q12

Name of the task for which sampling tuning is to be performed. For example, q12 indicates that sampling tuning is performed for this task (the prerequisite is that enable_sampling_all_sql is set to false).

total_memory_threshold

1200

Maximum memory usage of Spark during sampling, in GB.

spark_default_config

--conf spark.sql.orc.impl=native --conf spark.locality.wait=0 --conf spark.sql.broadcastTimeout=300

Default Spark parameter. Generally, the default parameter is not involved in parameter sampling.

hive

log_start_time

-

Start time of Tez run logs. You can view the date on the Hadoop UI.

log_end_time

-

End time of Tez run logs.

enable_sampling_all_sql

true

Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid.

history_application_name

query12

Name of the task for which sampling tuning is to be performed. For example, query12 indicates that sampling tuning is performed for this task (the prerequisite is that enable_sampling_all_sql is set to false).

hive_default_config

--hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true --hiveconf hive.exec.compress.intermediate=true

Default Hive parameter. Generally, the default parameter is not involved in parameter sampling.

omniAdvisorLogAnalyzer.properties

Table 2 Configuration items of omniAdvisorLogAnalyzer.properties

Module

Item

Default Value

Description

log analyzer

log.analyzer.thread.count

3

Number of concurrent log parsing processes, that is, number of concurrent analysis tasks.

kerberos

kerberos.principal

-

User used for Kerberos authentication in secure mode.

kerberos.keytab.file

-

Keytab file path used for Kerberos authentication in secure mode.

datasource

datasource.db.driver

com.mysql.cj.jdbc.Driver

Driver of the database used to save the analysis result after log analysis.

datasource.db.url

-

URL of the database used to save the analysis result after log analysis.

spark fetcher

spark.enable

false

Indicates whether to enable Spark Fetcher.

spark.workload

default

Database of the Spark task.

spark.eventLogs.mode

-

Spark Fetcher mode, which can be log or rest.

spark.timeout.seconds

30

Timeout duration of a Spark Fetcher analysis task, in seconds.

spark.rest.url

http://localhost:18080

URL of the Spark history server, which is used only in rest mode.

spark.log.directory

-

Directory for storing Spark logs, which is used only in log mode.

spark.log.maxSize.mb

500

Maximum size of a Spark analysis log file, in MB. If the size of a log file exceeds the maximum size, the log file will be ignored. This parameter is used only in log mode.

tez fetcher

tez.enable

false

Indicates whether to enable Tez Fetcher.

tez.workload

default

Database of the Tez task.

tez.timline.url

http://localhost:8188

URL of the timeline server.

tez.timeline.timeout.ms

6000

Timeout duration of accessing the timeline server, in milliseconds.

Database Tables

Table 3 Fields in the yarn_app_result, best_config, and sampling_config tables

Table

Field

Description

yarn_app_result

application_id

Application ID of the task executed on Yarn.

application_name

Application name of the task executed on Yarn. Generally, it is the identifier of the same task.

application_workload

Name of the database used to perform the task. Generally, it is specified by --database.

start_time

Start time of the task.

finish_time

End time of the task.

job_type

Task type, which can be Spark or Tez.

duration_time

Time used to execute the task, in milliseconds.

parameters

Parameters used to execute the task.

execution_status

Task execution status. 0: failed; 1: succeeded.

query

Query statement used to execute the task.

best_config

application_id

Application ID of the task executed on Yarn.

application_name

Application name of the task executed on Yarn. Generally, it is the identifier of the same task.

application_workload

Name of the database used to perform the task. Generally, it is specified by --database.

duration_time

Time used to execute the task, in milliseconds.

parameters

Parameters used to execute the task.

job_type

Task type, which can be Spark or Tez.

query_hash

Hash value obtained by performing SHA256 calculation on the SQL statement that executes the task.

sampling_config

application_id

Application ID of the task executed on Yarn.

application_name

Application name of the task executed on Yarn. Generally, it is the identifier of the same task.

application_workload

Name of the database used to perform the task. Generally, it is specified by --database.

duration_time

Time used to execute the task, in milliseconds.

parameters

Parameters used to execute the task.

job_type

Task type, which can be Spark or Tez.

execution_status

Task execution status. 0: failed; 1: succeeded.