我要评分
获取效率
正确性
完整性
易理解

Recommended Views

You can provide data tables as required and use OmniMV to obtain the recommended view set.

In this example, the TPC-DS data table is used as the test table.

  1. Modify the configuration file.
    1. Open the config/omnimv_config_spark.cfg file.
      1
      2
      cd /opt/omnimv/BoostKit-omnimv_1.2.0
      vi config/omnimv_config_spark.cfg
      
    2. Press i to enter the insert mode and modify the file as follows:
      • Spark 3.1.1
        1
        2
        3
        4
        5
        database = tpcds_bin_partitioned_varchar_orc_3000
        logparser_jar_path = /opt/omnimv/boostkit-omnimv-logparser-spark-3.1.1-1.2.0-aarch64.jar
        cache_plugin_jar_path = /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar
        hdfs_input_path = hdfs://server1:9000/spark2-history
        sqls_path = hdfs://server1:9000/omnimv/tpcds
        
      • Spark 3.4.3
        1
        2
        3
        4
        5
        database = tpcds_bin_partitioned_varchar_orc_3000
        logparser_jar_path = /opt/omnimv/boostkit-omnimv-logparser-spark-3.4.3-1.2.0-aarch64.jar
        cache_plugin_jar_path = /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar
        hdfs_input_path = hdfs://server1:9000/spark2-history
        sqls_path = hdfs://server1:9000/omnimv/tpcds
        
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    Table 1 OmniMV configuration items

    Configuration Item

    Description

    Reference Value

    database

    Name of the database used for the test.

    -

    logparser_jar_path

    Path to the log parser JAR package.

    • /opt/omnimv/boostkit-omnimv-logparser-spark-3.1.1-1.2.0-aarch64.jar
    • /opt/omnimv/boostkit-omnimv-logparser-spark-3.4.3-1.2.0-aarch64.jar

    cache_plugin_jar_path

    Path to the OmniMV JAR package.

    • /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar
    • /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar

    hdfs_input_path

    Address for accessing Spark task run logs in HDFS.

    hdfs://server1:9000/spark2-history

    sqls_path

    Address for accessing the test SQL statements in HDFS.

    hdfs://server1:9000/omnimv/tpcds

  2. Obtain test SQL statements (for example, TPC-DS).
    1. Download the Spark source code and decompress it. Obtain the TPC-DS test SQL statements from spark-master/sql/core/src/test/resources/tpcds/.
    2. Upload the test SQL statements to HDFS (the path is the value of sqls_path in the configuration file).
      hdfs dfs -mkdir -p hdfs://server1:9000/omnimv/tpcds
      hdfs dfs -put /Spark_source_code_directory/sql/core/src/test/resources/tpcds/* hdfs://server1:9000/omnimv/tpcds/
  3. Initialize the database. This is to obtain the table creation information of the target table for creating recommended views.

    If the message "init_database succeed!" is displayed, the execution is successful.

    1
    python main.py spark init_database
    
  4. Run the test SQL statements. (The TPC-DS test SQL statements are used as an example.)

    If the message "run_original_sqls succeed!" is displayed, the execution is successful.

    1
    python main.py spark run_original_sqls
    
  5. Parse run logs.
    1. Use the http://ip:18080 address to access the Spark history server and view the start time and end time of the SQL task.
    2. Open the config/omnimv_config_spark.cfg file.
      1
      vi config/omnimv_config_spark.cfg
      
    3. Press i to enter the insert mode and modify the file as follows:
      1
      2
      3
      4
      # Time when the test SQL statements start to run
      q_log_start_time = 2022-09-15 11:58
      # Time when the test SQL statements end
      q_log_end_time = 2022-09-15 17:15
      
    4. Press Esc, type :wq!, and press Enter to save the file and exit.
    5. Run the script for parsing the logs.

      If the message "parse_query_logs succeed!" is displayed, the execution is successful.

      1
      python main.py spark parse_query_logs
      
  6. Generate candidate views.
    1. Optional: Open the config/omnimv_config_spark.cfg file.
      1
      vi config/omnimv_config_spark.cfg
      
    2. Optional: Press i to enter the insert mode and modify the file as follows:
      1
      2
      # Maximum number of top candidate materialized views. The number of recommended materialized views is less than or equal to the value of this parameter.
      mv_limit = 10
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
    4. Run the script.
      If "greedy recommend for candidate mv" is displayed, the execution is successful.
      1
      python main.py spark generate_views
      
  7. Create top N views.
    Manual confirmation is required to determine whether to create candidate views.
    1
    python main.py spark create_greedy_views