Task Killed and Two Logs Parsed After the Hive Multi-SQL Statements Parameter Sampling Times Out

Symptom

During the Hive multi-SQL statements parameter sampling, if the task running time exceeds twice the optimal parameter execution time of the current database, the task killing logic is triggered. Due to the Tez mechanism, it is possible that a new task is started by the direction acyclic graph (DAG) that is killed. In this case, the old sampling task may have DAGs of only some SQL statements that are successfully executed. Two tasks may be parsed during log parsing, and one of the two tasks is a task in which some SQL statements are successfully executed.

When identification is set to application_name, the incomplete successful task is considered as the optimal parameter of the tuning task. As a result, a slow parameter is recommended.

On the Timeline Server WebUI, you can see the killed task and the task automatically restarted by Hive.

Key Process and Cause Analysis

None

Conclusion and Solution

If identification_type in the $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg file is set to the default value job_hash, the task fails to be matched, which does not affect parameter tuning. In this case, continue sampling and recommendation.

If identification_type is set to application_name, you need to clear abnormal data in the database. For details, see the following operations.

On the Yarn Timeline WebUI or in the parameter tuning log, query the ID of the killed task by entering yarn app -kill. Assume that the ID is killed_app_id and the application_name of the task is app_name.

Access the MySQL database and select the corresponding database.

# Enter the username and password to log in to the MySQL database.
mysql -uuser -ppassword

# Select the database to be used.
use hive_test;

Delete the parsing content of the killed task from history_config.
```
DELETE FROM history_config WHERE application_id='killed_app_id';
```

Select the optimal parameters from history_config and update them to best_config.

UPDATE best_config
SET 
    parameters = subquery.parameters,
    duration_time = subquery.duration_time
FROM (
    SELECT parameters, duration_time
    FROM history_config
    WHERE application_name = 'app_name'
    ORDER BY duration_time ASC
    LIMIT 1
) AS subquery
WHERE best_config.application_name = 'app_name';

Continue the parameter sampling or recommendation.

Parent topic: Troubleshooting