Constraints
- The end-to-end parameter tuning function is available only for Spark and Hive on Tez because only task logs of Spark and Hive on Tez can be parsed.
- OmniAdvisor can be used to tune Spark SQL and application tasks and Hive on Tez SQL tasks. There are a large number of Spark and Hive task submission scenarios. For the Spark engine, OmniAdvisor supports the spark-sql/spark-submit JAR application scenarios; for the Hive engine, OmniAdvisor supports the SQL scenario (but not the Beeline submission mode). For details about the supported task submission scenarios, see Table 1.
- When parsing Spark task logs, you need to enable the history server. When parsing Hive on Tez task logs, you need to enable the timeline server.
- For Spark and Hive SQL tasks, SQL information in task logs cannot be encrypted. Otherwise, SQL information cannot be parsed and subsequent sampling tuning is impossible.
- You can configure Spark and Hive on Tez parameter lists in spark_config.xml and hive_config.xml respectively. However, JVM and OS parameters are not supported in the lists.
- For Hive tuning, OmniAdvisor supports only the parsing, sampling, and tuning of complex query SQL statements that trigger directed acyclic graphs (DAGs). For Hive on Tez, common non-simple operations, such as JOIN operations involving multiple tables, GROUP BY, SORT BY, DISTINCT, and UNION, or complex window functions, trigger one or more DAGs. A single-table query statement or a statement that does not involve a large number of computations may require only one MapReduce operation. In this case, the statement may not be explicitly expressed as a complex DAG structure, such as SELECT * FROM table WHERE column = 'value'; and CREATE TABLE new_table AS SELECT * FROM existing_table LIMIT 1000;.
Engine |
Task Submission Scenario |
Description |
|---|---|---|
Spark |
Single SQL statement in spark-sql client mode |
- |
Multi SQL statements in spark-sql client mode |
More than one SQL statement is executed in a single session. |
|
spark-submit application-jar client mode |
Only application tasks packaged into JAR files can be parsed. |
|
spark-submit application-jar cluster mode |
Only application tasks packaged into JAR files can be parsed. |
|
Hive |
Single SQL statement |
- |
Multiple SQL statements |
More than one SQL statement is executed in a single session. |
Parent topic: Feature Overview