Rate This Document
Findability
Accuracy
Completeness
Readability

Using Spark to Execute SQL Statements

You can use the OmniMV Spark engine to execute SQL statements in either of the two methods. After the SQL statements are executed, the physical SQL execution plan is rewritten.

Prerequisites

Install OmniMV by following instructions in Installing the Feature and deploy Spark by following instructions in Spark Deployment Guide.

Methods

You can use the OmniMV Spark engine to execute SQL statements in either of the following ways. After the SQL statements are executed, the execution plan is automatically rewritten.

  • Method 1: Log in to the spark-sql client and manually run SQL statements.
    1. Log in to the client.
      • Spark 3.1.1
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        spark-sql \
        --deploy-mode client \
        --driver-cores 5 \
        --driver-memory 5g \
        --num-executors 18 \
        --executor-cores 21 \
        --executor-memory 55g \
        --master yarn \
        --database Database_name \
        --name Task_name \
        --jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar \
        --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
        --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
        
      • Spark 3.4.3
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        spark-sql \
        --deploy-mode client \
        --driver-cores 5 \
        --driver-memory 5g \
        --num-executors 18 \
        --executor-cores 21 \
        --executor-memory 55g \
        --master yarn \
        --database Database_name \
        --name Task_name \
        --jars /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar \
        --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
        --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
        
    2. Adjust the Spark parameters based on the cluster configuration.
    3. Manually execute the SQL statements.
  • Method 2: Use a script to execute SQL statements in a batch.

    Adjust the Spark parameters based on the cluster configuration. The -f parameter is used to invoke the SQL file. You can customize a script to execute SQL statements in a batch by referring to the example. For example:

    • Spark 3.1.1
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      spark-sql \
      --deploy-mode client \
      --driver-cores 5 \
      --driver-memory 5g \
      --num-executors 18 \
      --executor-cores 21 \
      --executor-memory 55g \
      --master yarn \
      --database Database_name \
      --name Task_name \
      --jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar \
      --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
      --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
      -f SQL_file_path
      
    • Spark 3.4.3
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      spark-sql \
      --deploy-mode client \
      --driver-cores 5 \
      --driver-memory 5g \
      --num-executors 18 \
      --executor-cores 21 \
      --executor-memory 55g \
      --master yarn \
      --database Database_name \
      --name Task_name \
      --jars /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar \
      --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
      --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
      -f SQL_file_path
      

Syntax Supported by Spark for OmniMV

Table 1 Syntax supported by Spark for OmniMV

Operation

Syntax

Create MV (creating a materialized view)

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]mv_name

[DISABLE REWRITE]

[COMMENT 'mv_comment']

[PARTITIONED BY (col_name, ...)]

AS

<query>;

Drop MV (deleting a materialized view)

DROP MATERIALIZED VIEW [IF EXISTS] [db_name.]mv_name;

Show MVS (enumerating materialized views)

SHOW MATERIALIZED VIEWS [ON [db_name.]mv_name];

Alter MV rewrite (specifying whether to enable rewriting)

ALTER MATERIALIZED VIEW [db_name.]mv_name ENABLE|DISABLE REWRITE;

Refresh MV (refreshing a materialized view)

REFRESH MATERIALIZED VIEW [db_name.]mv_name;

WASH OUT MATERIALIZED VIEW (eliminating a materialized view)

--ALL: eliminates all views.

--UNUSED_DAYS: eliminates views that have not been used for ${UNUSED_DAYS} days. This is the default policy.

--RESERVE_QUANTITY_BY_VIEW_COUNT: reserves top ${RESERVE_QUANTITY_BY_VIEW_COUNT} frequently used views.

--DROP_QUANTITY_BY_SPACE_CONSUMED: eliminates top ${DROP_QUANTITY_BY_SPACE_CONSUMED} views that occupy the most memory space.

WASH OUT [ALL] MATERIALIZED VIEW [

USING

[UNUSED_DAYS,]

[RESERVE_QUANTITY_BY_VIEW_COUNT,]

[DROP_QUANTITY_BY_SPACE_CONSUMED]

]

Spark Startup Parameters of OmniMV

Table 2 Spark startup parameters of OmniMV

Startup Parameter

Default Value

Description

spark.sql.omnimv.enable

true

Indicates whether to enable rewrite in OmniMV.

  • true: enables rewrite in OmniMV.
  • false: disables rewrite in OmniMV.

spark.sql.omnimv.show.length

50

Maximum characters of the materialized view information printed by the show materialized views command.

spark.sql.omnimv.default.datasource

orc

Storage format of materialized views. The options are orc and parquet.

spark.sql.omnimv.logLevel

DEBUG

Log level for materialized views. The options are DEBUG, INFO, WARN, and ERROR.

spark.sql.omnimv.log.enable

true

Indicates whether to export parsed SQL statements to event logs.

  • true: exports parsed SQL statements to event logs.
  • false: does not export parsed SQL statements to event logs.

spark.sql.omnimv.metadata.path

/omnimv/plugin_metadata

HDFS path of materialized view metadata.

spark.sql.omnimv.metadata.initbyquery.enable

false

Indicates whether to load only the materialized view metadata related to the table used by a query SQL statement. This parameter can be used to accelerate the loading.

  • true: loads only the materialized view metadata related to the table used by a query SQL statement.
  • false: loads all the materialized view metadata.

spark.sql.omnimv.dbs

No default value. Set it based on your requirements.

Indicates whether to load only the materialized view metadata in specified databases. This parameter can be used to accelerate the loading. This parameter can be set to omnimv, omnimv1, or another desired value.

spark.sql.omnimv.washout.automatic.enable

false

Indicates whether to enable automatic elimination of historical materialized views. The default policy is UNUSED_DAYS.

true: enables automatic elimination of materialized views.

false: disables automatic elimination of materialized views.

spark.sql.omnimv.washout.unused.day

30

UNUSED_DAYS policy (default). It eliminates materialized views that have not been used in the past 30 days.

spark.sql.omnimv.washout.reserve.quantity.byViewCnt

25

RESERVE_QUANTITY_BY_VIEW_COUNT policy. It retains only the top 25 views after sorting the materialized views by usage frequency in descending order, and eliminates the others.

spark.sql.omnimv.washout.drop.quantity.bySpaceConsumed

3

DROP_QUANTITY_BY_SPACE_CONSUMED policy. It eliminates the top 3 views after sorting the materialized views by used storage space in descending order.

spark.sql.omnimv.washout.automatic.time.interval

35

Number of days between two adjacent automatic eliminations of materialized views.

spark.sql.omnimv.washout.automatic.checkTime.interval

3600

Interval for automatic eliminations within a session, in seconds.

spark.sql.omnimv.washout.automatic.view.quantity

20

Minimum number of views that trigger an automatic elimination.

spark.sql.omnimv.detect-rewrite-sqls.enable

false

Indicates whether to accelerate metadata loading only for rewritable SQL statements of materialized views. This can reduce the performance deterioration caused by the time-consuming metadata loading of SQL statements that do not take effect.

true: enables acceleration of metadata loading only for rewritable SQL statements of materialized views.

false: enables acceleration of metadata loading for all SQL statements.

Kryo Parameters of OmniMV

When OmniMV uses Kryo to serialize materialized view metadata, the user's configuration may be incompatible with OmniMV. Therefore, a fixed configuration is used to avoid affecting Spark functions.

Table 3 Kryo parameters of OmniMV

Parameter

Fixed Value

Description

spark.kryo.unsafe

false

Does not use the unsafe mode of Kryo.

spark.kryo.registrationRequired

false

It is not required to register all classes that need to be serialized with Kryo.

spark.kryo.registrator

""

Does not specify a custom Kryo registrar class.

spark.kryo.classesToRegister

""

Does not specify the list of classes to be registered.

spark.kryo.pool

true

Enables the Kryo instance pool.

spark.kryo.referenceTracking

true

Enables reference tracking.

spark.kryoserializer.buffer

64k

Sets the initial size of the Kryo serialization buffer.

spark.kryoserializer.buffer.max

64m

Sets the maximum size of the Kryo serialization buffer.

Checking Whether the OmniMV Spark Engine Is Successfully Loaded

  1. Log in to the Spark SQL client and manually execute SQL statements (Method 1).
  2. Run the command of showing views. If the following information is displayed, the plugin is successfully loaded.
    1
    SHOW MATERIALIZED VIEWS;
    

  3. Create a sample base table and a sample view.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    CREATE TABLE IF NOT EXISTS column_type(
        empid INT,
        deptno INT,
        locationid INT,
        booleantype BOOLEAN,
        bytetype BYTE,
        shorttype SHORT,
        integertype INT,
        longtype LONG,
        floattype FLOAT,
        doubletype DOUBLE,
        datetype DATE,
        timestamptype TIMESTAMP,
        stringtype STRING,
        decimaltype DECIMAL
    );
    INSERT INTO TABLE column_type VALUES(
        1,1,1,TRUE,1,1,1,1,1.0,1.0,
        DATE '2022-01-01',
        TIMESTAMP '2022-01-01',
        'stringtype1',1.0
    );
    INSERT INTO TABLE column_type VALUES(
        2,2,2,TRUE,2,2,2,2,2.0,2.0,
        DATE '2022-02-02',
        TIMESTAMP '2022-02-02',
        'stringtype2',2.0
    );
    

  4. Create a materialized view.
    1
    2
    3
    CREATE MATERIALIZED VIEW IF NOT EXISTS mv_create1
    AS
    SELECT * FROM column_type;
    

  5. Use the EXPLAIN statement to check whether the execution plan of the query is rewritten.
    1
    2
    EXPLAIN 
    SELECT * FROM column_type;