Using OmniMV to Recommend Views

In this example, the TPC-DS data table is used as the test table.

Modify the configuration file.

Modify the config/omnimv_config_spark.cfg file. Example:

database = tpcds_bin_partitioned_decimal_orc_8000
logparser_jar_path = /opt/spark/boostkit-omnimv-logparser-spark-3.1.1-1.1.0-aarch64.jar
cache_plugin_jar_path = /opt/spark/boostkit-omnimv-spark-3.1.1-1.1.0-aarch64.jar
hdfs_input_path = hdfs://server1:9000/spark2-history
sqls_path = hdfs://server1:9000/omnimv/tpcds

**Table 1** OmniMV configuration items
Item	Description	Reference Value
database	Name of the database used for the test	-
logparser_jar_path	Path to the log parser JAR package	/opt/spark/boostkit-omnimv-logparser-spark-3.1.1-1.1.0-aarch64.jar
cache_plugin_jar_path	Path to the JAR package of OmniMV	/opt/spark/boostkit-omnimv-spark-3.1.1-1.1.0-aarch64.jar
hdfs_input_path	Address for accessing Spark task run logs in HDFS	hdfs://server1:9000/spark2-history
sqls_path	Address for accessing the test SQL statements in HDFS	hdfs://server1:9000/omnimv/tpcds

Obtain test SQL statements (for example, TPC-DS).
1. Download the Spark source code and decompress it. Obtain the TPC-DS test SQL statements from spark-master/sql/core/src/test/resources/tpcds/.
2. Upload the test SQL statements to HDFS (the path is the value of sqls_path in the configuration file).
```
hdfs dfs -mkdir -p hdfs://server1:9000/omnimv/tpcds
hdfs dfs -put /Spark_source_code_directory/sql/core/src/test/resources/tpcds/* hdfs://server1:9000/omnimv/tpcds/
```
Initialize the database. This is to obtain the table creation information of the target table for creating recommended views.
If the message "init_database succeed!" is displayed, the execution is successful.
```
python main.pyc spark init_database
```
Run the test SQL statements. (The TPC-DS test SQL statements are used as an example.)
If the message "run_original_sqls succeed!" is displayed, the execution is successful.
```
python main.pyc spark run_original_sqls
```
Parse run logs.
1. Use the http://ip:18080 address to access the Spark history server and view the start time and end time of the native SQL task.
2. Modify the config/omnimv_config_spark.cfg file.
```
# Time when the test SQL statements start to run
q_log_start_time = 2022-09-15 11:58
# Time when the test SQL statements end
q_log_end_time = 2022-09-15 17:15
```
3. Run the script for parsing the logs.
  If the message "parse_query_logs succeed!" is displayed, the execution is successful.
```
python main.pyc spark parse_query_logs
```

Generate candidate views.

Optional: Modify the config/omnimv_config_spark.cfg file.

# Maximum number of top candidate materialized views. The number of recommended materialized views is less than or equal to the value of this parameter.
cnt_limit = 10

Run the script.
If "greedy recommend for candidate mv" is displayed, the execution is successful.
```
python main.pyc spark generate_views
```

Create top N materialized views. (Manual confirmation is required to determine whether to create candidate views.)
```
python main.pyc spark create_greedy_views
```
Parse the rewriting task logs.
1. Use the http://ip:18080 address to access the Spark history server and view the start time and end time of the rewritten SQL task.
2. Modify the config/omnimv_config_spark.cfg file.
```
# Start time of the view creation task
mv_log_start_time = 2022-09-15 20:18
# End time of the view creation task
mv_log_end_time = 2022-09-15 21:17
```
3. Run the script.
```
python main.pyc spark parse_mv_logs
```
Train the cost estimation model.
After the model is trained, you can run the hdfs dfs -ls /omnimv/training command to check whether the model is generated.
```
python main.pyc spark cost_estimation
```
Obtain recommended materialized views.
```
python main.pyc spark recommend
```
Create recommended materialized views. (Manual confirmation is required to determine whether to create recommended views.)
```
python main.pyc spark create_recommend_views
```

Parent topic: Using Materialized Views on the Spark Engine