我要评分
获取效率
正确性
完整性
易理解

Using Spark to Execute SQL Statements

You can use the OmniMV Spark engine to execute SQL statements in either of the two methods. After the SQL statements are executed, the physical SQL execution plan is rewritten.

Prerequisites

Install OmniMV by following instructions in Installing OmniMV and deploy Spark by following instructions in Spark Deployment Guide.

Methods

You can use the OmniMV Spark engine to execute SQL statements in either of the following ways. After the SQL statements are executed, the execution plan is automatically rewritten.

  • Method 1: Log in to the Spark SQL client and manually execute SQL statements.
    1. Log in to the client.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      spark-sql \
      --deploy-mode client \
      --driver-cores 5 \
      --driver-memory 5g \
      --num-executors 18 \
      --executor-cores 21 \
      --executor-memory 55g \
      --master yarn \
      --database Database_name \
      --name Task_name \
      --jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.1.0-aarch64.jar \
      --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
      
    2. Adjust the Spark parameters based on the cluster configuration.
    3. Manually execute the SQL statements.
  • Method 2: Use a script to execute SQL statements in a batch.
    Example:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    spark-sql \
    --deploy-mode client \
    --driver-cores 5 \
    --driver-memory 5g \
    --num-executors 18 \
    --executor-cores 21 \
    --executor-memory 55g \
    --master yarn \
    --database Database_name \
    --name Task_name \
    --jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.1.0-aarch64.jar \
    --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
    -f SQL_file_path
    

    Adjust the Spark parameters based on the cluster configuration. The -f parameter is used to invoke the SQL file. You can customize a script to execute SQL statements in a batch by referring to the example.

Syntax Supported by the OmniMV Spark Engine

Table 1 Syntax supported by the OmniMV spark engine

Operation

Syntax

Create MV (creating a materialized view)

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]mv_name

[DISABLE REWRITE]

[COMMENT 'mv_comment']

[PARTITIONED BY (col_name, ...)]

AS

<query>;

Drop MV (deleting a materialized view)

DROP MATERIALIZED VIEW [IF EXISTS] [db_name.]mv_name;

Show MVS (enumerating materialized views)

SHOW MATERIALIZED VIEWS [ON [db_name.]mv_name];

Alter MV rewrite (specifying whether to enable rewriting)

ALTER MATERIALIZED VIEW [db_name.]mv_name ENABLE|DISABLE REWRITE;

Refresh MV (refreshing a materialized view)

REFRESH MATERIALIZED VIEW [db_name.]mv_name;

WASH OUT MATERIALIZED VIEW (eliminating a materialized view)

--ALL: eliminates all views.

--UNUSED_DAYS: eliminates views that have not been used for ${UNUSED_DAYS} days. This is the default policy.

--RESERVE_QUANTITY_BY_VIEW_COUNT: reserves top ${RESERVE_QUANTITY_BY_VIEW_COUNT} frequently used views.

--DROP_QUANTITY_BY_SPACE_CONSUMED: eliminates top ${DROP_QUANTITY_BY_SPACE_CONSUMED} views that occupy the most memory space.

WASH OUT [ALL] MATERIALIZED VIEW [

USING

[UNUSED_DAYS,]

[RESERVE_QUANTITY_BY_VIEW_COUNT,]

[DROP_QUANTITY_BY_SPACE_CONSUMED]

]

Checking Whether the OmniMV Spark Engine Is Successfully Loaded

  1. Log in to the Spark SQL client and manually execute SQL statements (Method 1).
  2. Run the enumeration command. If no error is reported, the plugin is successfully loaded.
    1
    SHOW MATERIALIZED VIEWS;
    

  3. Create a sample base table and a sample view.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    CREATE TABLE IF NOT EXISTS column_type(
        empid INT,
        deptno INT,
        locationid INT,
        booleantype BOOLEAN,
        bytetype BYTE,
        shorttype SHORT,
        integertype INT,
        longtype LONG,
        floattype FLOAT,
        doubletype DOUBLE,
        datetype DATE,
        timestamptype TIMESTAMP,
        stringtype STRING,
        decimaltype DECIMAL
    );
    INSERT INTO TABLE column_type VALUES(
        1,1,1,TRUE,1,1,1,1,1.0,1.0,
        DATE '2022-01-01',
        TIMESTAMP '2022-01-01',
        'stringtype1',1.0
    );
    INSERT INTO TABLE column_type VALUES(
        2,2,2,TRUE,2,2,2,2,2.0,2.0,
        DATE '2022-02-02',
        TIMESTAMP '2022-02-02',
        'stringtype2',2.0
    );
    

  4. Create a materialized view.
    1
    2
    3
    CREATE MATERIALIZED VIEW IF NOT EXISTS mv_create1
    AS
    SELECT * FROM column_type;
    

  5. Use the EXPLAIN statement to check whether the execution plan of the query is rewritten.
    1
    2
    EXPLAIN 
    SELECT * FROM column_type;