Using Spark to Execute SQL Statements

You can use the OmniMV Spark engine to execute SQL statements in either of the two methods. After the SQL statements are executed, the physical SQL execution plan is rewritten.

Prerequisites

Install OmniMV by following instructions in Installing the Feature and deploy Spark by following instructions in Spark Deployment Guide.

Methods

You can use the OmniMV Spark engine to execute SQL statements in either of the following ways. After the SQL statements are executed, the execution plan is automatically rewritten.

Method 1: Log in to the spark-sql client and manually run SQL statements.

Spark 3.1.1

            
                 spark-sql \
--deploy-mode client \
--driver-cores 5 \
--driver-memory 5g \
--num-executors 18 \
--executor-cores 21 \
--executor-memory 55g \
--master yarn \
--database Database_name \
--name Task_name \
--jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar \
--conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
--conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \

Spark 3.4.3

            
                 spark-sql \
--deploy-mode client \
--driver-cores 5 \
--driver-memory 5g \
--num-executors 18 \
--executor-cores 21 \
--executor-memory 55g \
--master yarn \
--database Database_name \
--name Task_name \
--jars /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar \
--conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
--conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \

Adjust the Spark parameters based on the cluster configuration.
Manually execute the SQL statements.

Method 2: Use a script to execute SQL statements in a batch.

Adjust the Spark parameters based on the cluster configuration. The -f parameter is used to invoke the SQL file. You can customize a script to execute SQL statements in a batch by referring to the example. For example:

Spark 3.1.1

          
               spark-sql \
--deploy-mode client \
--driver-cores 5 \
--driver-memory 5g \
--num-executors 18 \
--executor-cores 21 \
--executor-memory 55g \
--master yarn \
--database Database_name \
--name Task_name \
--jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar \
--conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
--conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
-f SQL_file_path

Spark 3.4.3

          
               spark-sql \
--deploy-mode client \
--driver-cores 5 \
--driver-memory 5g \
--num-executors 18 \
--executor-cores 21 \
--executor-memory 55g \
--master yarn \
--database Database_name \
--name Task_name \
--jars /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar \
--conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \
--conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
-f SQL_file_path

Syntax Supported by Spark for OmniMV

**Table 1** Syntax supported by Spark for OmniMV
Operation	Syntax
Create MV (creating a materialized view)	CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]mv_name [DISABLE REWRITE] [COMMENT 'mv_comment'] [PARTITIONED BY (col_name, ...)] AS <query>;
Drop MV (deleting a materialized view)	DROP MATERIALIZED VIEW [IF EXISTS] [db_name.]mv_name;
Show MVS (enumerating materialized views)	SHOW MATERIALIZED VIEWS [ON [db_name.]mv_name];
Alter MV rewrite (specifying whether to enable rewriting)	ALTER MATERIALIZED VIEW [db_name.]mv_name ENABLE\|DISABLE REWRITE;
Refresh MV (refreshing a materialized view)	REFRESH MATERIALIZED VIEW [db_name.]mv_name;
WASH OUT MATERIALIZED VIEW (eliminating a materialized view) --ALL: eliminates all views. --UNUSED_DAYS: eliminates views that have not been used for ${UNUSED_DAYS} days. This is the default policy. --RESERVE_QUANTITY_BY_VIEW_COUNT: reserves top ${RESERVE_QUANTITY_BY_VIEW_COUNT} frequently used views. --DROP_QUANTITY_BY_SPACE_CONSUMED: eliminates top ${DROP_QUANTITY_BY_SPACE_CONSUMED} views that occupy the most memory space.	WASH OUT [ALL] MATERIALIZED VIEW [ USING [UNUSED_DAYS,] [RESERVE_QUANTITY_BY_VIEW_COUNT,] [DROP_QUANTITY_BY_SPACE_CONSUMED] ]

Spark Startup Parameters of OmniMV

**Table 2** Spark startup parameters of OmniMV
Startup Parameter	Default Value	Description
spark.sql.omnimv.enable	true	Indicates whether to enable rewrite in OmniMV. true: enables rewrite in OmniMV. false: disables rewrite in OmniMV.
spark.sql.omnimv.show.length	50	Maximum characters of the materialized view information printed by the show materialized views command.
spark.sql.omnimv.default.datasource	orc	Storage format of materialized views. The options are orc and parquet.
spark.sql.omnimv.logLevel	DEBUG	Log level for materialized views. The options are DEBUG, INFO, WARN, and ERROR.
spark.sql.omnimv.log.enable	true	Indicates whether to export parsed SQL statements to event logs. true: exports parsed SQL statements to event logs. false: does not export parsed SQL statements to event logs.
spark.sql.omnimv.metadata.path	/omnimv/plugin_metadata	HDFS path of materialized view metadata.
spark.sql.omnimv.metadata.initbyquery.enable	false	Indicates whether to load only the materialized view metadata related to the table used by a query SQL statement. This parameter can be used to accelerate the loading. true: loads only the materialized view metadata related to the table used by a query SQL statement. false: loads all the materialized view metadata.
spark.sql.omnimv.dbs	No default value. Set it based on your requirements.	Indicates whether to load only the materialized view metadata in specified databases. This parameter can be used to accelerate the loading. This parameter can be set to omnimv, omnimv1, or another desired value.
spark.sql.omnimv.washout.automatic.enable	false	Indicates whether to enable automatic elimination of historical materialized views. The default policy is UNUSED_DAYS. true: enables automatic elimination of materialized views. false: disables automatic elimination of materialized views.
spark.sql.omnimv.washout.unused.day	30	UNUSED_DAYS policy (default). It eliminates materialized views that have not been used in the past 30 days.
spark.sql.omnimv.washout.reserve.quantity.byViewCnt	25	RESERVE_QUANTITY_BY_VIEW_COUNT policy. It retains only the top 25 views after sorting the materialized views by usage frequency in descending order, and eliminates the others.
spark.sql.omnimv.washout.drop.quantity.bySpaceConsumed	3	DROP_QUANTITY_BY_SPACE_CONSUMED policy. It eliminates the top 3 views after sorting the materialized views by used storage space in descending order.
spark.sql.omnimv.washout.automatic.time.interval	35	Number of days between two adjacent automatic eliminations of materialized views.
spark.sql.omnimv.washout.automatic.checkTime.interval	3600	Interval for automatic eliminations within a session, in seconds.
spark.sql.omnimv.washout.automatic.view.quantity	20	Minimum number of views that trigger an automatic elimination.
spark.sql.omnimv.detect-rewrite-sqls.enable	false	Indicates whether to accelerate metadata loading only for rewritable SQL statements of materialized views. This can reduce the performance deterioration caused by the time-consuming metadata loading of SQL statements that do not take effect. true: enables acceleration of metadata loading only for rewritable SQL statements of materialized views. false: enables acceleration of metadata loading for all SQL statements.

Kryo Parameters of OmniMV

When OmniMV uses Kryo to serialize materialized view metadata, the user's configuration may be incompatible with OmniMV. Therefore, a fixed configuration is used to avoid affecting Spark functions.

**Table 3** Kryo parameters of OmniMV
Parameter	Fixed Value	Description
spark.kryo.unsafe	false	Does not use the unsafe mode of Kryo.
spark.kryo.registrationRequired	false	It is not required to register all classes that need to be serialized with Kryo.
spark.kryo.registrator	""	Does not specify a custom Kryo registrar class.
spark.kryo.classesToRegister	""	Does not specify the list of classes to be registered.
spark.kryo.pool	true	Enables the Kryo instance pool.
spark.kryo.referenceTracking	true	Enables reference tracking.
spark.kryoserializer.buffer	64k	Sets the initial size of the Kryo serialization buffer.
spark.kryoserializer.buffer.max	64m	Sets the maximum size of the Kryo serialization buffer.

Checking Whether the OmniMV Spark Engine Is Successfully Loaded

Log in to the Spark SQL client and manually execute SQL statements (Method 1).
Run the command of showing views. If the following information is displayed, the plugin is successfully loaded.
1

SHOW MATERIALIZED VIEWS;

Create a sample base table and a sample view.

        
         
           
           
             CREATE TABLE IF NOT EXISTS column_type(
    empid INT,
    deptno INT,
    locationid INT,
    booleantype BOOLEAN,
    bytetype BYTE,
    shorttype SHORT,
    integertype INT,
    longtype LONG,
    floattype FLOAT,
    doubletype DOUBLE,
    datetype DATE,
    timestamptype TIMESTAMP,
    stringtype STRING,
    decimaltype DECIMAL
);
INSERT INTO TABLE column_type VALUES(
    1,1,1,TRUE,1,1,1,1,1.0,1.0,
    DATE '2022-01-01',
    TIMESTAMP '2022-01-01',
    'stringtype1',1.0
);
INSERT INTO TABLE column_type VALUES(
    2,2,2,TRUE,2,2,2,2,2.0,2.0,
    DATE '2022-02-02',
    TIMESTAMP '2022-02-02',
    'stringtype2',2.0
);

            

          

        
       

Create a materialized view.

        
             CREATE MATERIALIZED VIEW IF NOT EXISTS mv_create1
AS
SELECT * FROM column_type;

Use the EXPLAIN statement to check whether the execution plan of the query is rewritten.
1 2

EXPLAIN SELECT * FROM column_type;

Parent topic: Using on Spark