Rate This Document
Findability
Accuracy
Completeness
Readability

OmniAdvisor

  • OmniAdvisor 1.0 supports only the AI tuning algorithm and applies to Spark and Hive components.
  • In addition to the AI tuning algorithm, OmniAdvisor 2.0 adds the migration tuning algorithm, expert rule algorithm, and operator acceleration algorithm to form a multi-dimensional tuning system. Compared with OmniAdvisor 1.0, OmniAdvisor 2.0 delivers higher performance and implements user-unaware tuning. It applies to Spark.
  • If OmniAdvisor 2.0 suits your application scenario, it is recommended to use this version. For details about the components supported by different versions, see Feature Description.

OmniAdvisor 2.0

Spark contains more than 200 parameters. These parameters affect each other and have a wide value range. As a result, manual parameter tuning becomes complex and time-consuming. Traditional tuning methods often suffer from incomplete parameter coverage, low efficiency, and limited effectiveness. To address this challenge, OmniAdvisor 2.0 combines AI-assisted, expert rule–based, and transfer generalization tuning algorithms with operator acceleration technologies to provide automated parameter tuning and recommendations.

OmniAdvisor 2.0 consists of three parts: interception component, background tuning service, and historical database. The interception component acts as the client, the background tuning service acts as the server, and the historical database stores data generated by the software. Figure 1 shows the OmniAdvisor 2.0 software.

Figure 1 OmniAdvisor 2.0

The interception component works as follows:

  1. Intercepts the tenant's task request and parses it to obtain task request information.
  2. Queries the historical database to obtain the optimal task configuration.
  3. Submits the task load and optimal configuration to the Spark cluster for execution and obtains the execution result.
  4. Records the execution result into the historical database.
The background tuning service works as follows:
  1. Waits for the administrator to specify the task load and submit a tuning request.
  2. Queries the historical database to obtain related historical data.
  3. Tunes parameters to obtain the optimal configuration and write it to the historical database.

Performance Improvement with OmniAdvisor 2.0

Compared with the parameters tuned by experts, OmniAdvisor 2.0 improves performance by approximately 20% on the TPC-DS 3 TB dataset. (Figure 2 shows the test results of some typical SQL statements.)

Figure 2 Performance Improvement with OmniAdvisor 2.0

OmniAdvisor 1.0

The Spark engine has more than 200 configuration parameters, many of which have a wide value range and are dependent on each other. This makes Spark parameter tuning complex and challenging. Traditionally, Spark parameters are manually tuned, which has the following disadvantages: incomplete adjustable parameters identified, low tuning efficiency, and poor tuning performance. Therefore, it is difficult to manually determine the optimal configurations when there are a large number of adjustable parameters. The Hive engine also has similar problems. To address such challenges, OmniAdvisor 1.0 aims to leverage AI to automatically recommend parameter configurations, thereby improving the tuning efficiency and performance.

Currently, OmniAdvisor supports optimization for Spark SQL, Spark Application, and Hive SQL.

OmniAdvisor consists of two modules:

  • Log parsing module. OmniAdvisor obtains the running data of Spark historical tasks from the Spark History Server or obtains the running data of Tez historical tasks from the Timeline Server, parses the log information after task execution, and saves the parsed SQL parameter information, SQL execution status, and execution time to the MySQL database.
  • Parameter tuning module. OmniAdvisor obtains historical parameters from the database for model training and samples the historical parameters to obtain optimal parameter configurations. The optimal parameter configurations are used to execute tasks. Then OmniAdvisor parses the execution results and updates information in the database. When you need to re-execute a task, you can search the database for the optimal parameter configurations of historical tasks and use them to execute the task.
Figure 3 OmniAdvisor 1.0

Performance Improvement with OmniAdvisor 1.0

Spark tasks executed using parameter configurations recommended by OmniAdvisor 1.0 deliver about 15% higher performance than tasks executed using the parameter configurations tuned by experts.

Figure 4 Performance improvement with OmniAdvisor 1.0