Rate This Document
Findability
Accuracy
Completeness
Readability

Constraints

  • The end-to-end parameter tuning function is available only for Spark and Hive on Tez because only task logs of Spark and Hive on Tez can be parsed.
  • OmniAdvisor can be used to tune Spark SQL and application tasks and Hive on Tez SQL tasks. There are a large number of Spark and Hive task submission scenarios. For the Spark engine, OmniAdvisor supports the spark-sql/spark-submit JAR application scenarios; for the Hive engine, OmniAdvisor supports the SQL scenario (but not the Beeline submission mode). For details about the supported task submission scenarios, see Table 1.
  • When parsing Spark task logs, you need to enable the history server. When parsing Hive on Tez task logs, you need to enable the timeline server.
  • For Spark and Hive SQL tasks, SQL information in task logs cannot be encrypted. Otherwise, SQL information cannot be parsed and subsequent sampling tuning is impossible.
  • You can configure Spark and Hive on Tez parameter lists in spark_config.xml and hive_config.xml respectively. However, JVM and OS parameters are not supported in the lists.
  • For Hive tuning, OmniAdvisor supports only the parsing, sampling, and tuning of complex query SQL statements that trigger directed acyclic graphs (DAGs). For Hive on Tez, common non-simple operations, such as JOIN operations involving multiple tables, GROUP BY, SORT BY, DISTINCT, and UNION, or complex window functions, trigger one or more DAGs. A single-table query statement or a statement that does not involve a large number of computations may require only one MapReduce operation. In this case, the statement may not be explicitly expressed as a complex DAG structure, such as SELECT * FROM table WHERE column = 'value'; and CREATE TABLE new_table AS SELECT * FROM existing_table LIMIT 1000;.
Table 1 Task submission scenarios supported by tuning

Engine

Task Submission Scenario

Description

Spark

Single SQL statement in spark-sql client mode

-

Multi SQL statements in spark-sql client mode

More than one SQL statement is executed in a single session.

spark-submit application-jar client mode

Only application tasks packaged into JAR files can be parsed.

spark-submit application-jar cluster mode

Only application tasks packaged into JAR files can be parsed.

Hive

Single SQL statement

-

Multiple SQL statements

More than one SQL statement is executed in a single session.