OmniAdvisor Configuration File
common_config.cfg
Module |
Parameter |
Default Value |
Description |
|---|---|---|---|
workload |
workload_name |
tpcds_bin_partitioned_decimal_orc_2 |
Name of the tested database. |
log_analyzer_path |
/opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64 |
Path for storing the decompressed log parsing module. |
|
identification_type |
job_hash |
Unique ID of a task. If the value is application_name, the hash value of the task name is used to search for the optimal parameters of the task in the database. If the default value is job_hash, the hash value of the task query (Spark or Hive SQL task) or application-jar package (Spark application task) is used to search for the optimal parameters. |
|
database |
db_name |
test |
Name of the MySQL database. If the name does not exist, it will be automatically created. |
db_host |
localhost |
Name for the host connecting to the MySQL database. |
|
db_port |
3306 |
Port for connecting to the MySQL database. |
|
spark |
log_start_time |
- |
Start time of Spark run logs. You can view the date on the Hadoop UI. |
log_end_time |
- |
End time of Spark run logs. |
|
spark_default_config |
--conf spark.sql.orc.impl=native --conf spark.locality.wait=0 --conf spark.sql.broadcastTimeout=300 |
Default Spark parameter. Generally, the default parameter is not involved in parameter sampling. |
|
hive |
log_start_time |
- |
Start time of Tez run logs. You can view the date on the Hadoop UI. |
log_end_time |
- |
End time of Tez run logs. |
|
hive_default_config |
--hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true |
Default Hive parameter. Generally, the default parameter is not involved in parameter sampling. |
omniAdvisorLogAnalyzer.properties
Module |
Item |
Default Value |
Description |
|---|---|---|---|
log analyzer |
log.analyzer.thread.count |
3 |
Number of concurrent log parsing processes, that is, number of concurrent analysis tasks. |
kerberos |
kerberos.principal |
- |
User used for Kerberos authentication in secure mode. |
kerberos.keytab.file |
- |
Keytab file path used for Kerberos authentication in secure mode. |
|
datasource |
datasource.db.driver |
com.mysql.cj.jdbc.Driver |
Driver of the database used to save the analysis result after log analysis. |
datasource.db.url |
- |
URL of the database used to save the analysis result after log analysis. |
|
spark fetcher |
spark.enable |
false |
Indicates whether to enable Spark Fetcher. |
spark.workload |
default |
Database of the Spark task. |
|
spark.eventLogs.mode |
- |
Spark Fetcher mode, which can be log or rest. |
|
spark.timeout.seconds |
30 |
Timeout duration of a Spark Fetcher analysis task, in seconds. |
|
spark.rest.url |
http://localhost:18080 |
URL of the Spark history server, which is used only in rest mode. |
|
spark.log.directory |
- |
Directory for storing Spark logs, which is used only in log mode. |
|
spark.log.maxSize.mb |
500 |
Maximum size of a Spark analysis log file, in MB. If the size of a log file exceeds the maximum size, the log file will be ignored. This parameter is used only in log mode. |
|
tez fetcher |
tez.enable |
false |
Indicates whether to enable Tez Fetcher. |
tez.workload |
default |
Database of the Tez task. |
|
tez.timline.url |
http://localhost:8188 |
URL of the timeline server. |
|
tez.timeline.timeout.ms |
6000 |
Timeout duration of accessing the timeline server, in milliseconds. |
Database Tables
Table |
Field |
Description |
|---|---|---|
history_config |
application_id |
Application ID of the task executed on Yarn. |
application_name |
Application name of the task executed on Yarn. Generally, it is the identifier of the same task. |
|
application_workload |
Name of the database used to perform the SQL task. Generally, it is specified by --database. |
|
start_time |
Start time of the task. |
|
finish_time |
End time of the task. |
|
duration_time |
Time used to execute the task, in milliseconds. |
|
job_type |
Task type, which can be Spark or Tez. |
|
submit_method |
Method of submitting a Spark task. spark-sql indicates that the task is an SQL task submitted by spark-sql, and spark-submit indicates that the task is an application task submitted by spark-submit. |
|
deploy_mode |
Deployment mode used for a Spark task. client indicates the Yarn client mode and cluster indicates the Yarn cluster mode. |
|
submit_cmd |
Command for submitting a Spark task. |
|
parameters |
Parameters used to execute the task. |
|
execution_status |
Task execution status. 0: failed; 1: succeeded. |
|
query |
Query statement used to execute an SQL task. |
|
identification |
Unique ID of the task. |
|
best_config |
application_id |
Application ID of the task executed on Yarn. |
application_name |
Application name of the task executed on Yarn. Generally, it is the identifier of the same task. |
|
application_workload |
Name of the database used to perform the task. Generally, it is specified by --database. |
|
duration_time |
Time used to execute the task, in milliseconds. |
|
parameters |
Parameters used to execute the task. |
|
submit_cmd |
Command for submitting a Spark task. |
|
job_type |
Task type, which can be Spark or Tez. |
|
query_hash |
Hash value obtained by performing SHA256 calculation on the SQL statement that executes the task. |