我要评分
获取效率
正确性
完整性
易理解

OmniAdvisor Configuration File

common_config.cfg

Table 1 Configuration items of common_config.cfg

Module

Parameter

Default Value

Description

workload

workload_name

tpcds_bin_partitioned_decimal_orc_2

Name of the tested database.

log_analyzer_path

/opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64

Path for storing the decompressed log parsing module.

identification_type

job_hash

Unique ID of a task. If the value is application_name, the hash value of the task name is used to search for the optimal parameters of the task in the database. If the default value is job_hash, the hash value of the task query (Spark or Hive SQL task) or application-jar package (Spark application task) is used to search for the optimal parameters.

database

db_name

test

Name of the MySQL database. If the name does not exist, it will be automatically created.

db_host

localhost

Name for the host connecting to the MySQL database.

db_port

3306

Port for connecting to the MySQL database.

spark

log_start_time

-

Start time of Spark run logs. You can view the date on the Hadoop UI.

log_end_time

-

End time of Spark run logs.

spark_default_config

--conf spark.sql.orc.impl=native --conf spark.locality.wait=0 --conf spark.sql.broadcastTimeout=300

Default Spark parameter. Generally, the default parameter is not involved in parameter sampling.

hive

log_start_time

-

Start time of Tez run logs. You can view the date on the Hadoop UI.

log_end_time

-

End time of Tez run logs.

hive_default_config

--hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true

Default Hive parameter. Generally, the default parameter is not involved in parameter sampling.

omniAdvisorLogAnalyzer.properties

Table 2 Configuration items of omniAdvisorLogAnalyzer.properties

Module

Item

Default Value

Description

log analyzer

log.analyzer.thread.count

3

Number of concurrent log parsing processes, that is, number of concurrent analysis tasks.

kerberos

kerberos.principal

-

User used for Kerberos authentication in secure mode.

kerberos.keytab.file

-

Keytab file path used for Kerberos authentication in secure mode.

datasource

datasource.db.driver

com.mysql.cj.jdbc.Driver

Driver of the database used to save the analysis result after log analysis.

datasource.db.url

-

URL of the database used to save the analysis result after log analysis.

spark fetcher

spark.enable

false

Indicates whether to enable Spark Fetcher.

spark.workload

default

Database of the Spark task.

spark.eventLogs.mode

-

Spark Fetcher mode, which can be log or rest.

spark.timeout.seconds

30

Timeout duration of a Spark Fetcher analysis task, in seconds.

spark.rest.url

http://localhost:18080

URL of the Spark history server, which is used only in rest mode.

spark.log.directory

-

Directory for storing Spark logs, which is used only in log mode.

spark.log.maxSize.mb

500

Maximum size of a Spark analysis log file, in MB. If the size of a log file exceeds the maximum size, the log file will be ignored. This parameter is used only in log mode.

tez fetcher

tez.enable

false

Indicates whether to enable Tez Fetcher.

tez.workload

default

Database of the Tez task.

tez.timline.url

http://localhost:8188

URL of the timeline server.

tez.timeline.timeout.ms

6000

Timeout duration of accessing the timeline server, in milliseconds.

Database Tables

Table 3 Fields in the history_config and best_config tables

Table

Field

Description

history_config

application_id

Application ID of the task executed on Yarn.

application_name

Application name of the task executed on Yarn. Generally, it is the identifier of the same task.

application_workload

Name of the database used to perform the SQL task. Generally, it is specified by --database.

start_time

Start time of the task.

finish_time

End time of the task.

duration_time

Time used to execute the task, in milliseconds.

job_type

Task type, which can be Spark or Tez.

submit_method

Method of submitting a Spark task. spark-sql indicates that the task is an SQL task submitted by spark-sql, and spark-submit indicates that the task is an application task submitted by spark-submit.

deploy_mode

Deployment mode used for a Spark task. client indicates the Yarn client mode and cluster indicates the Yarn cluster mode.

submit_cmd

Command for submitting a Spark task.

parameters

Parameters used to execute the task.

execution_status

Task execution status. 0: failed; 1: succeeded.

query

Query statement used to execute an SQL task.

identification

Unique ID of the task.

best_config

application_id

Application ID of the task executed on Yarn.

application_name

Application name of the task executed on Yarn. Generally, it is the identifier of the same task.

application_workload

Name of the database used to perform the task. Generally, it is specified by --database.

duration_time

Time used to execute the task, in milliseconds.

parameters

Parameters used to execute the task.

submit_cmd

Command for submitting a Spark task.

job_type

Task type, which can be Spark or Tez.

query_hash

Hash value obtained by performing SHA256 calculation on the SQL statement that executes the task.