OmniMV
OmniMV is the materialized view feature.
OmniMV uses AI algorithms to recommend the optimal materialized view from historical SQL queries, automatically matches SQL statements with a materialized view in Spark or ClickHouse, and replaces part of the SQL statements in an execution plan with the matched materialized view. This feature greatly reduces repeated calculations and increases the query efficiency. You can submit an SQL task to a Spark or ClickHouse cluster. The cluster management node distributes the task to multiple compute nodes as subtasks for execution.
Figure 1 shows the OmniMV architecture.
OmniMV consists of three modules: candidate view generation module, cost evaluation module, and SQL write module.
- Candidate view generation module: generates candidate views.
- Cost evaluation module: Trains a cost evaluation model to predict the cost of candidate views.
- SQL write module: modifies the physical execution plan of SQL statements.
Use OmniMV in the following way:
- Collect historical query logs, such as Yarn logs.
- Parse historical logs to obtain SQL information, including the SQL text, SQL execution plan, and SQL running time.
- Generate candidate views based on the information obtained in 2, and then use the greedy selection policy to select top N candidate views.
- Create candidate views and use the SQL write module to modify the physical execution plan of SQL statements. Then train a cost evaluation model based on the obtained training data.
- Use the cost evaluation model to predict the cost of each candidate view.
- OmniMV recommends the final view sets based on the used greedy selection policy.
