Configuration Reference for Tuning
Before tuning Spark or Hive tasks, perform the following steps to confirm or modify the OmniAdvisor configurations.
Log Parsing Module Parameters
The configuration file of the log parsing module specifies the Spark history server address and port, Yarn timeline server address and port, Spark and Tez parsing options, and number of log parsing task threads. Before parameter tuning, check whether the configuration items are correct.
- On the management node, open the configuration file $OMNIADVISOR_HOME/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64/conf/omniAdvisorLogAnalyzer.properties.
vi $OMNIADVISOR_HOME/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64/conf/omniAdvisorLogAnalyzer.properties
- Press i to enter the insert mode and view the configuration information. For details about the omniAdvisorLogAnalyzer.properties file, see omniAdvisorLogAnalyzer.properties.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# Number of concurrent log parsing processes. log.analyzer.thread.count=3 # Database driver. Currently, only MySQL is supported. datasource.db.driver=com.mysql.cj.jdbc.Driver # Indicates whether to enable Spark log parsing. spark.enable=true # Timeout duration of Spark log analysis, in seconds. If this timeout interval is exceeded, the task analysis fails. spark.timeout.seconds=30 # Indicates whether to enable Tez log parsing. tez.enable=true # URL of the timeline service. tez.timeline.url=http://server1:8188 # Timeout duration of timeline server connections, in milliseconds. tez.timeline.timeout.ms=6000 # User used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode. kerberos.principal=principle # Keytab file path used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode. kerberos.keytab.file=/usr/principle.keytab
- Check whether Spark log parsing is enabled in the omniAdvisorLogAnalyzer.properties configuration file. Enable it if you need to tune Spark.Confirm the Spark log parsing configuration. Select either of the two log collection modes, rest and log, supported by Spark log parsing. In rest mode, the history server REST API of Spark is invoked to obtain the log file to be parsed. In log mode, the Spark task log file is directly analyzed.
- The default log collection mode is rest. Check the mode configuration in the omniAdvisorLogAnalyzer.properties configuration file.
# Indicates whether to enable Spark log parsing. spark.enable=true # Spark log parsing mode, which can be rest or log. spark.eventLogs.mode=rest # URL of the Spark history server. spark.rest.url=http://server1:18080
spark.rest.url indicates the IP address and port of the Spark history server. Ensure that the configuration is correct. For details about how to verify the configuration, see 3.
- Optional: To collect Spark logs in log mode, you need to modify the omniAdvisorLogAnalyzer.properties configuration file.
1 2 3 4 5 6 7 8
# Indicates whether to enable Spark log parsing. spark.enable=true # Spark log parsing mode, which can be rest or log. spark.eventLogs.mode=log # Address and directory of Spark event logs in HDFS. Ensure that the configuration is correct and Spark event logs are accessible. spark.log.directory=hdfs://server1:9000/spark2-history # Maximum size of a Spark log file, in MB. If this size exceeds the directory size, this log file will not be parsed. spark.log.maxSize.mb=500
- The default log collection mode is rest. Check the mode configuration in the omniAdvisorLogAnalyzer.properties configuration file.
- Check whether Tez log parsing is enabled in the omniAdvisorLogAnalyzer.properties configuration file. Enable it if you need to tune Hive.
1 2 3 4 5 6
# Indicates whether to enable Tez log parsing. tez.enable=true # URL of the timeline service. tez.timeline.url=http://server1:8188 # Timeout duration of timeline server connections, in milliseconds. tez.timeline.timeout.ms=6000
tez.timeline.url indicates the URL of the Yarn timeline service. Change it to the actual address and port to ensure that the timeline service is accessible.
- Optional: In secure mode, add the Kerberos configuration to the omniAdvisorLogAnalyzer.properties configuration file.
1 2 3 4
# User used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode. kerberos.principal=principle # Keytab file path used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode. kerberos.keytab.file=/usr/principle.keytab
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Copy the Hadoop configuration files hdfs-site.xml and core-site.xml to the $OMNIADVISOR_HOME/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64/conf directory.
cp ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml $OMNIADVISOR_HOME/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64/conf cp ${HADOOP_HOME}/etc/hadoop/core-site.xml $OMNIADVISOR_HOME/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64/conf
Basic Configuration for Parameter Tuning
In the $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg configuration file, check the database configuration and ID calculation method.
- Open $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg. For details about the common_config.cfg file, see common_config.cfg.
vi $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg
- Press i to enter the insert mode and modify the MySQL configuration.
1 2 3 4 5
[database] # MySQL database information, such as the user name and port. db_name = test_advisor db_host = localhost db_port = 3306
- Check the workload_name database name, log parsing module path, and identification_type (job_hash by default) in the workload module.
1 2 3 4 5 6 7 8
[workload] # Name of the tested database. workload_name = tpcds_bin_partitioned_decimal_orc_100 # Path for storing the files decompressed from the log parsing module. The JAR package of decompressed log files must exist in the path. log_analyzer_path = /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.1.0-aarch64 # You need to specify the unique ID of the task. You can choose to use the hash value of the task name (application_name) or the hash value of the task (corresponding to the hash value of the query statement or JAR package) to search for the optimal parameters of the task in the database. # options: [application_name, job_hash] identification_type = job_hash
- If you choose application_name, the hash value of application_name is used as the task identification.
- If you choose job_hash, the hash value of the SQL statement (SQL task) or JAR package (application task) of the task is used as the task identification.
- During log parsing, different identification values are calculated based on identification_type and stored in the database. During subsequent parameter recommendation, optimal parameters are matched based on the identification values.
- Press Esc, type :wq!, and press Enter to save the file and exit.