Recommended Views
You can provide data tables as required and use OmniMV to obtain the recommended view set.
In this example, the TPC-DS data table is used as the test table.
- Modify the configuration file.
- Open the config/omnimv_config_spark.cfg file.
1 2
cd /opt/omnimv/BoostKit-omnimv_1.1.0 vi config/omnimv_config_spark.cfg
- Press i to enter the insert mode and modify the file as follows:
1 2 3 4 5
database = tpcds_bin_partitioned_decimal_orc_8000 logparser_jar_path = /opt/omnimv/boostkit-omnimv-logparser-spark-3.1.1-1.1.0-aarch64.jar cache_plugin_jar_path = /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.1.0-aarch64.jar hdfs_input_path = hdfs://server1:9000/spark2-history sqls_path = hdfs://server1:9000/omnimv/tpcds
- Press Esc, type :wq!, and press Enter to save the file and exit.
Table 1 OmniMV configuration items Configuration Item
Description
Reference Value
database
Name of the database used for the test.
-
logparser_jar_path
Path of the log parser JAR package.
/opt/omnimv/boostkit-omnimv-logparser-spark-3.1.1-1.1.0-aarch64.jar
cache_plugin_jar_path
Path of the OmniMV JAR package.
/opt/omnimv/boostkit-omnimv-spark-3.1.1-1.1.0-aarch64.jar
hdfs_input_path
Address for accessing Spark task run logs in HDFS.
hdfs://server1:9000/spark2-history
sqls_path
Address for accessing the test SQL statements in HDFS.
hdfs://server1:9000/omnimv/tpcds
- Open the config/omnimv_config_spark.cfg file.
- Obtain test SQL statements (for example, TPC-DS).
- Download the Spark source code and decompress it. Obtain the TPC-DS test SQL statements from spark-master/sql/core/src/test/resources/tpcds/.
- Upload the test SQL statements to HDFS (the path is the value of sqls_path in the configuration file).
hdfs dfs -mkdir -p hdfs://server1:9000/omnimv/tpcds hdfs dfs -put /Spark_source_code_directory/sql/core/src/test/resources/tpcds/* hdfs://server1:9000/omnimv/tpcds/
- Initialize the database. This is to obtain the table creation information of the target table for creating recommended views.
If the message "init_database succeed!" is displayed, the execution is successful.
1python main.pyc spark init_database
- Run the test SQL statements. (The TPC-DS test SQL statements are used as an example.)
If the message "run_original_sqls succeed!" is displayed, the execution is successful.
1python main.pyc spark run_original_sqls
- Parse run logs.
- Use the http://ip:18080 address to access the Spark history server and view the start time and end time of the native SQL task.
- Open the config/omnimv_config_spark.cfg file.
1vi config/omnimv_config_spark.cfg - Press i to enter the insert mode and modify the file as follows:
1 2 3 4
# Time when the test SQL statements start to run q_log_start_time = 2022-09-15 11:58 # Time when the test SQL statements end q_log_end_time = 2022-09-15 17:15
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the script for parsing the logs.
If the message "parse_query_logs succeed!" is displayed, the execution is successful.
1python main.pyc spark parse_query_logs
- Generate candidate views.
- Optional: Open the config/omnimv_config_spark.cfg file.
1vi config/omnimv_config_spark.cfg - Optional: Press i to enter the insert mode and modify the file as follows:
1 2
# Maximum number of top candidate materialized views. The number of recommended materialized views is less than or equal to the value of this parameter. mv_limit = 10
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the script.
- Optional: Open the config/omnimv_config_spark.cfg file.
- Create top N views.Manual confirmation is required to determine whether to create candidate views.
1python main.pyc spark create_greedy_views
- Parse the rewriting task logs.
- Use the http://ip:18080 address to access the Spark history server and view the start time and end time of the rewritten SQL task.
- Open the config/omnimv_config_spark.cfg file.
1vi config/omnimv_config_spark.cfg - Press i to enter the insert mode and modify the file as follows:
1 2 3 4
# Start time of the view creation task mv_log_start_time = 2022-09-15 20:18 # End time of the view creation task mv_log_end_time = 2022-09-15 21:17
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the script.
1python main.pyc spark parse_mv_logs
- Train the cost estimation model.
After the model is trained, you can run the hdfs dfs -ls /omnimv/training command to check whether the model is generated.
1python main.pyc spark cost_estimation
- Obtain recommended materialized views.
1python main.pyc spark recommend
- Create recommended materialized views.Manual confirmation is required to determine whether to create recommended views.
1python main.pyc spark create_recommend_views