Rate This Document
Findability
Accuracy
Completeness
Readability

Tuning a Hive Task

Use the OmniAdvisor feature to obtain the optimal running parameters of Hive tasks and optimize task performance.

  1. Modify the $HIVE_HOME/conf/hive-site.xml configuration file to ensure that Tez is specified for hive.execution.engine.
    1. Open the configuration file.
      vi $HIVE_HOME/conf/hive-site.xml
    2. Press i to enter the insert mode and ensure that Tez is specified for hive.execution.engine.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      <property>
          <name>hive.execution.engine</name>
          <value>tez</value>
      </property>
      <property>
          <name>hive.cli.tez.session.async</name>
          <value>false</value>
          <description>
            Whether to start Tez
            session in background when running CLI with Tez, allowing CLI to be available earlier.
          </description>
       </property>
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  2. Initialize the database.
    1. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
      1
      vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg
      
    2. Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
      1
      2
      3
      4
      5
      [database]
      # MySQL database information, such as the user name and port number.
      db_name = test_hive
      db_host = localhost
      db_port = 3306
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    4. Run the initialization command:
      1
      python main.pyc hive init_environment
      

      After the command is run, tables yarn_app_result, best_config, and sampling_config are created in the test_hive database.

  3. Configure the log parsing module.
    1. To parse the Tez task logs of Hive, you need to enable the timeline server of Yarn. Open the /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf/omniAdvisorLogAnalyzer.properties configuration file.
      1
      vi /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf/omniAdvisorLogAnalyzer.properties
      
    2. Press i to enter the insert mode and add or configure the following configuration. For details about the omniAdvisorLogAnalyzer.properties file, see omniAdvisorLogAnalyzer.properties.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      # Number of concurrent log parsing processes.
      log.analyzer.thread.count=3
      # Database driver. Currently, only MySQL is supported.
      datasource.db.driver=com.mysql.cj.jdbc.Driver
      # Database URL.
      datasource.db.url=jdbc:mysql://server1:3306/database
      
      # Indicates whether to enable Tez log parsing.
      tez.enable=true
      # Working database of Hive Tez.
      tez.workload=default
      # URL of the timeline service.
      tez.timeline.url=http://server1:8188
      # Timeout duration of timeline server connections, in milliseconds.
      tez.timeline.timeout.ms=6000
      # User used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode.
      kerberos.principal=principle
      # Keytab file path used for Kerberos authentication in secure mode. Skip this parameter in non-secure mode.
      kerberos.keytab.file=/usr/principle.keytab
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    4. Copy the Hadoop configuration files hdfs-site.xml and core-site.xml to the /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf directory.
      cp ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf
      cp ${HADOOP_HOME}/etc/hadoop/core-site.xml /opt/OmniAdvisor/boostkit-omniadvisor-log-analyzer-1.0.0-aarch64/conf
  4. Invoke the log parsing module to write the parsed data into the database.
    1. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
      1
      vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg
      
    2. Press i to enter the insert mode and modify the start time and end time of the log. For details about the common_config.cfg file, see common_config.cfg.
      1
      2
      3
      4
      5
      6
      7
      [hive]
      # Start time of Tez run logs. You can view the date on the Hadoop UI.
      log_start_time = 2023-09-14 19:12:45
      # End time of Tez run logs.
      log_end_time = 2023-09-14 19:19:45
      # Indicates whether to sample all SQL statements that have been run (obtain application_name from the database). If the value is true, the history_application_name configuration item is invalid.
      enable_sampling_all_sql = true
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    4. Run the collection command.
      1
      python main.pyc hive fetch_history_data
      

      After the data is parsed successfully, the data result is written to the yarn_app_result and best_config tables, whereas the sampling_config table is still empty.

  5. Sample the parameters of historical tasks and tune the parameters.
    • Sample the parameters of each historical task for 40 times.
      1. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
        1
        vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg
        
      2. Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
        1
        2
        3
        4
        5
        [sampling]
        # Number of parameter sampling rounds.
        sampling_epochs = 40
        [hive]
        enable_sampling_all_sql = true
        
      3. Press Esc, type :wq!, and press Enter to save the file and exit.
      4. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/hive/hive_config.yml configuration file. Modify the parameter value range or add tuning parameters based on your requirements and the comments in the configuration file.
      5. Run the collection command.
        1
        python main.pyc hive parameter_sampling
        

        After the command is run, you can view the sampled data in the sampling_config table of the database.

    • Sample the parameters of a specified historical task for 40 times.
      1. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
        1
        vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg
        
      2. Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
        1
        2
        3
        4
        5
        6
        [sampling]
        # Number of parameter sampling rounds.
        sampling_epochs = 40
        [hive]
        enable_sampling_all_sql = false
        history_application_name=q12
        
      3. Press Esc, type :wq!, and press Enter to save the file and exit.
      4. Run the collection command.
        1
        python main.pyc hive parameter_sampling
        

        After the command is run, you can view the sampled data in the sampling_config table of the database.

  6. Modify the execution script.
    1. Copy the native Hive script and rename it hive-recommend.
      rm -rf $HIVE_HOME/bin/hive-recommend
      cp $HIVE_HOME/bin/hive $HIVE_HOME/bin/hive-recommend
    2. Open the hive-recommend script.
      1
      vi hive-recommend
      
    3. Input /$TORUN and press Enter to start the search.
    4. Press i to enter the insert mode and modify the content in the $TORUN line.
      1
      $TORUN "$@"
      
      Modify the content as follows:
      1
      python /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/main.pyc hive parameter_recommend "$@"
      
    5. Press Esc, type :wq!, and press Enter to save the file and exit.
  7. Recommend executing the optimal parameters in the sampling to execute the task.
    • Recommend parameters based on the task name application_name.
      1. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg configuration file.
        1
        vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg
        
      2. Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
        1
        2
        3
        4
        [recommend]
        # After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters.
        # options: [application_name, query_hash]
        recommend_identifier = application_name
        
      3. Press Esc, type :wq!, and press Enter to save the file and exit.
      4. Run the command. Using q12 as an example, the following command recommends the optimal parameters in the database based on the task name.
        $HIVE_HOME/bin/hive-recommend --hiveconf hive.tez.container.size=1024 --hiveconf tez.am.resource.memory.mb=1024 --hiveconf hive.session.id=q12  --database tpcds_bin_partitioned_decimal_orc_3000  -f  /home/test_hive/q12.sql
    • Recommend parameters based on query_hash. Using q12 as an example, the following command recommends the optimal parameters in the database based on query_hash.
      1. Open the /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg file.
        1
        vi /opt/OmniAdvisor/BoostKit-omniadvisor_1.0.0/config/common_config.cfg
        
      2. Press i to enter the insert mode and modify the file as follows: For details about the common_config.cfg file, see common_config.cfg.
        1
        2
        3
        4
        [recommend]
        # After task sampling tuning is complete, you need to run a historical task again. You can use the task name (application_name) or query hash value (query_hash) to search the database for the optimal task parameters.
        # options: [application_name, query_hash]
        recommend_identifier = query_hash
        
      3. Press Esc, type :wq!, and press Enter to save the file and exit.
      4. Run the command.
        $HIVE_HOME/bin/hive-recommend --hiveconf hive.tez.container.size=1024 --hiveconf tez.am.resource.memory.mb=1024  --hiveconf hive.session.id=q12_bak  --database tpcds_bin_partitioned_decimal_orc_3000 -f /home/test_hive/q12.sql