Tuning a Hive Task
Use the OmniAdvisor feature to obtain the optimal running parameters of Hive tasks and optimize task performance.
- Modify the $HIVE_HOME/conf/hive-site.xml configuration file to ensure that Tez is specified for hive.execution.engine.
- Open the configuration file.
vi $HIVE_HOME/conf/hive-site.xml
- Press i to enter the insert mode and ensure that Tez is specified for hive.execution.engine.
1 2 3 4 5 6 7 8 9 10 11 12
<property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>hive.cli.tez.session.async</name> <value>false</value> <description> Whether to start Tez session in background when running CLI with Tez, allowing CLI to be available earlier. </description> </property>
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the configuration file.
- Initialize the database.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
1vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg - Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
1 2 3 4 5
[database] # MySQL database information, such as the user name and port number. db_name = test_hive db_host = localhost db_port = 3306
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the initialization command:
1python main.pyc hive init_environment
After the command is run, tables yarn_app_result, best_config, and sampling_config are created in the test_hive database.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
- Configure the log parsing module.
- To parse the Tez task logs of Hive, you need to enable the timeline server of Yarn. Open the /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf/omniAdvisorLogAnalyzer.properties configuration file.
1vi /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf/omniAdvisorLogAnalyzer.properties - Press i to enter the insert mode and add or configure the following configuration. For details about the omniAdvisorLogAnalyzer.properties file, see omniAdvisorLogAnalyzer.properties.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# Number of concurrent log parsing processes. log.analyzer.thread.count=3 # Database driver. Currently, only MySQL is supported. datasource.db.driver=com.mysql.cj.jdbc.Driver # Database URL. datasource.db.url=jdbc:mysql://server1:3306/database # Indicates whether to enable Tez log parsing. tez.enable=true # Working database of Hive Tez. tez.workload=default # URL of the timeline service. tez.timeline.url=http://server1:8188 # Timeout duration of timeline server connections, in milliseconds. tez.timeline.timeout.ms=6000 # User used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode. kerberos.principal=principle # Keytab file path used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode. kerberos.keytab.file=/usr/principle.keytab
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Copy the Hadoop configuration files hdfs-site.xml and core-site.xml to the /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf directory.
cp ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf cp ${HADOOP_HOME}/etc/hadoop/core-site.xml /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf
- To parse the Tez task logs of Hive, you need to enable the timeline server of Yarn. Open the /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf/omniAdvisorLogAnalyzer.properties configuration file.
- Invoke the log parsing module to write the parsed data into the database.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
1vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg - Press i to enter the insert mode and modify the start time and end time of the log. For details about the common_config.cfg file, see common_config.cfg.
1 2 3 4 5 6 7
[hive] # Start time of Tez run logs. You can view the date on the Hadoop UI. log_start_time = 2023-09-14 19:12:45 # End time of Tez run logs. log_end_time = 2023-09-14 19:19:45 # Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid. enable_sampling_all_sql = true
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the collection command.
1python main.pyc hive fetch_history_data
After the data is parsed successfully, the data result is written to the yarn_app_result and best_config tables, whereas the sampling_config table is still empty.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
- Sample the parameters of historical tasks and tune the parameters.
- Sample the parameters of each historical task for 40 times.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
1vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg - Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
1 2 3 4 5
[sampling] # Number of parameter sampling rounds. sampling_epochs = 40 [hive] enable_sampling_all_sql = true
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/hive/hive_config.yml configuration file. Modify the parameter value range or add tuning parameters based on your requirements and the comments in the configuration file.
- Run the collection command.
1python main.pyc hive parameter_sampling
After the command is run, you can view the sampled data in the sampling_config table of the database.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
- Sample the parameters of a specified historical task for 40 times.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
1vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg - Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
1 2 3 4 5 6
[sampling] # Number of parameter sampling rounds. sampling_epochs = 40 [hive] enable_sampling_all_sql = false history_application_name=q12
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the collection command.
1python main.pyc hive parameter_sampling
After the command is run, you can view the sampled data in the sampling_config table of the database.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
- Sample the parameters of each historical task for 40 times.
- Modify the execution script.
- Copy the native Hive script and rename it hive-recommend.
rm -rf $HIVE_HOME/bin/hive-recommend cp $HIVE_HOME/bin/hive $HIVE_HOME/bin/hive-recommend
- Open the hive-recommend script.
1vi hive-recommend - Input /$TORUN and press Enter to start the search.
- Press i to enter the insert mode and modify the content in the $TORUN line.
1$TORUN "$@"Modify the content as follows:1python /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/main.pyc hive parameter_recommend "$@"
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Copy the native Hive script and rename it hive-recommend.
- Recommend executing the optimal parameters in the sampling to execute the task.
- Recommend parameters based on the task name application_name.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
1vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg - Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
1 2 3 4
[recommend] # After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters. # options: [application_name, query_hash] recommend_identifier = application_name
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the command. Using q12 as an example, the following command recommends the optimal parameters in the database based on the task name.
$HIVE_HOME/bin/hive-recommend --hiveconf hive.tez.container.size=1024 --hiveconf tez.am.resource.memory.mb=1024 --hiveconf hive.session.id=q12 --database tpcds_bin_partitioned_decimal_orc_3000 -f /home/test_hive/q12.sql
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
- Recommend parameters based on query_hash. Using q12 as an example, the following command recommends the optimal parameters in the database based on query_hash.
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg file.
1vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg - Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
1 2 3 4
[recommend] # After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters. # options: [application_name, query_hash] recommend_identifier = query_hash
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the command.
$HIVE_HOME/bin/hive-recommend --hiveconf hive.tez.container.size=1024 --hiveconf tez.am.resource.memory.mb=1024 --hiveconf hive.session.id=q12_bak --database tpcds_bin_partitioned_decimal_orc_3000 -f /home/test_hive/q12.sql
- Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg file.
- Recommend parameters based on the task name application_name.
Parent topic: OmniAdvisor