OmniAdvisor Configuration File
common_config.cfg
Module |
Parameter |
Default Value |
Description |
|---|---|---|---|
workload |
workload_name |
tpcds_bin_partitioned_decimal_orc_3000 |
Name of the tested database. |
log_analyzer_path |
/opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64 |
Path for storing the decompressed log parsing module. |
|
database |
db_name |
test |
Name of the MySQL database. If the name does not exist, it will be automatically created. |
db_host |
localhost |
Name for the host connecting to the MySQL database. |
|
db_port |
3306 |
Port for connecting to the MySQL database. |
|
sampling |
sampling_epochs |
40 |
Number of parameter sampling rounds. |
recommend |
recommend_identifier |
application_name |
After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters. |
spark |
log_start_time |
- |
Start time of Spark run logs. You can view the date on the Hadoop UI. |
log_end_time |
- |
End time of Spark run logs. |
|
enable_sampling_all_sql |
true |
Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid. |
|
history_application_name |
q12 |
Name of the task for which sampling tuning is to be performed. For example, q12 indicates that sampling tuning is performed for this task (the prerequisite is that enable_sampling_all_sql is set to false). |
|
total_memory_threshold |
1200 |
Maximum memory usage of Spark during sampling, in GB. |
|
spark_default_config |
--conf spark.sql.orc.impl=native --conf spark.locality.wait=0 --conf spark.sql.broadcastTimeout=300 |
Default Spark parameter. Generally, the default parameter is not involved in parameter sampling. |
|
hive |
log_start_time |
- |
Start time of Tez run logs. You can view the date on the Hadoop UI. |
log_end_time |
- |
End time of Tez run logs. |
|
enable_sampling_all_sql |
true |
Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid. |
|
history_application_name |
query12 |
Name of the task for which sampling tuning is to be performed. For example, query12 indicates that sampling tuning is performed for this task (the prerequisite is that enable_sampling_all_sql is set to false). |
|
hive_default_config |
--hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true --hiveconf hive.exec.compress.intermediate=true |
Default Hive parameter. Generally, the default parameter is not involved in parameter sampling. |
omniAdvisorLogAnalyzer.properties
Module |
Item |
Default Value |
Description |
|---|---|---|---|
log analyzer |
log.analyzer.thread.count |
3 |
Number of concurrent log parsing processes, that is, number of concurrent analysis tasks. |
kerberos |
kerberos.principal |
- |
User used for Kerberos authentication in secure mode. |
kerberos.keytab.file |
- |
Keytab file path used for Kerberos authentication in secure mode. |
|
datasource |
datasource.db.driver |
com.mysql.cj.jdbc.Driver |
Driver of the database used to save the analysis result after log analysis. |
datasource.db.url |
- |
URL of the database used to save the analysis result after log analysis. |
|
spark fetcher |
spark.enable |
false |
Indicates whether to enable Spark Fetcher. |
spark.workload |
default |
Database of the Spark task. |
|
spark.eventLogs.mode |
- |
Spark Fetcher mode, which can be log or rest. |
|
spark.timeout.seconds |
30 |
Timeout duration of a Spark Fetcher analysis task, in seconds. |
|
spark.rest.url |
http://localhost:18080 |
URL of the Spark history server, which is used only in rest mode. |
|
spark.log.directory |
- |
Directory for storing Spark logs, which is used only in log mode. |
|
spark.log.maxSize.mb |
500 |
Maximum size of a Spark analysis log file, in MB. If the size of a log file exceeds the maximum size, the log file will be ignored. This parameter is used only in log mode. |
|
tez fetcher |
tez.enable |
false |
Indicates whether to enable Tez Fetcher. |
tez.workload |
default |
Database of the Tez task. |
|
tez.timline.url |
http://localhost:8188 |
URL of the timeline server. |
|
tez.timeline.timeout.ms |
6000 |
Timeout duration of accessing the timeline server, in milliseconds. |
Database Tables
Table |
Field |
Description |
|---|---|---|
yarn_app_result |
application_id |
Application ID of the task executed on Yarn. |
application_name |
Application name of the task executed on Yarn. Generally, it is the identifier of the same task. |
|
application_workload |
Name of the database used to perform the task. Generally, it is specified by --database. |
|
start_time |
Start time of the task. |
|
finish_time |
End time of the task. |
|
job_type |
Task type, which can be Spark or Tez. |
|
duration_time |
Time used to execute the task, in milliseconds. |
|
parameters |
Parameters used to execute the task. |
|
execution_status |
Task execution status. 0: failed; 1: succeeded. |
|
query |
Query statement used to execute the task. |
|
best_config |
application_id |
Application ID of the task executed on Yarn. |
application_name |
Application name of the task executed on Yarn. Generally, it is the identifier of the same task. |
|
application_workload |
Name of the database used to perform the task. Generally, it is specified by --database. |
|
duration_time |
Time used to execute the task, in milliseconds. |
|
parameters |
Parameters used to execute the task. |
|
job_type |
Task type, which can be Spark or Tez. |
|
query_hash |
Hash value obtained by performing SHA256 calculation on the SQL statement that executes the task. |
|
sampling_config |
application_id |
Application ID of the task executed on Yarn. |
application_name |
Application name of the task executed on Yarn. Generally, it is the identifier of the same task. |
|
application_workload |
Name of the database used to perform the task. Generally, it is specified by --database. |
|
duration_time |
Time used to execute the task, in milliseconds. |
|
parameters |
Parameters used to execute the task. |
|
job_type |
Task type, which can be Spark or Tez. |
|
execution_status |
Task execution status. 0: failed; 1: succeeded. |