Using Spark to Execute SQL Statements
You can use the OmniMV Spark engine to execute SQL statements in either of the two methods. After the SQL statements are executed, the physical SQL execution plan is rewritten.
Prerequisites
Install OmniMV by following instructions in Installing the Feature and deploy Spark by following instructions in Spark Deployment Guide.
Methods
You can use the OmniMV Spark engine to execute SQL statements in either of the following ways. After the SQL statements are executed, the execution plan is automatically rewritten.
- Method 1: Log in to the spark-sql client and manually run SQL statements.
- Log in to the client.
- Spark 3.1.1
1 2 3 4 5 6 7 8 9 10 11 12 13
spark-sql \ --deploy-mode client \ --driver-cores 5 \ --driver-memory 5g \ --num-executors 18 \ --executor-cores 21 \ --executor-memory 55g \ --master yarn \ --database Database_name \ --name Task_name \ --jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar \ --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \ --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
- Spark 3.4.3
1 2 3 4 5 6 7 8 9 10 11 12 13
spark-sql \ --deploy-mode client \ --driver-cores 5 \ --driver-memory 5g \ --num-executors 18 \ --executor-cores 21 \ --executor-memory 55g \ --master yarn \ --database Database_name \ --name Task_name \ --jars /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar \ --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \ --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \
- Spark 3.1.1
- Adjust the Spark parameters based on the cluster configuration.
- Manually execute the SQL statements.
- Log in to the client.
- Method 2: Use a script to execute SQL statements in a batch.
Adjust the Spark parameters based on the cluster configuration. The -f parameter is used to invoke the SQL file. You can customize a script to execute SQL statements in a batch by referring to the example. For example:
- Spark 3.1.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
spark-sql \ --deploy-mode client \ --driver-cores 5 \ --driver-memory 5g \ --num-executors 18 \ --executor-cores 21 \ --executor-memory 55g \ --master yarn \ --database Database_name \ --name Task_name \ --jars /opt/omnimv/boostkit-omnimv-spark-3.1.1-1.2.0-aarch64.jar \ --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \ --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \ -f SQL_file_path
- Spark 3.4.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14
spark-sql \ --deploy-mode client \ --driver-cores 5 \ --driver-memory 5g \ --num-executors 18 \ --executor-cores 21 \ --executor-memory 55g \ --master yarn \ --database Database_name \ --name Task_name \ --jars /opt/omnimv/boostkit-omnimv-spark-3.4.3-1.2.0-aarch64.jar \ --conf 'spark.sql.extensions=com.huawei.boostkit.spark.OmniMV' \ --conf spark.sql.omnimv.metadata.path=/omnimv/plugin_metadata \ -f SQL_file_path
- Spark 3.1.1
Syntax Supported by Spark for OmniMV
|
Operation |
Syntax |
|---|---|
|
Create MV (creating a materialized view) |
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]mv_name [DISABLE REWRITE] [COMMENT 'mv_comment'] [PARTITIONED BY (col_name, ...)] AS <query>; |
|
Drop MV (deleting a materialized view) |
DROP MATERIALIZED VIEW [IF EXISTS] [db_name.]mv_name; |
|
Show MVS (enumerating materialized views) |
SHOW MATERIALIZED VIEWS [ON [db_name.]mv_name]; |
|
Alter MV rewrite (specifying whether to enable rewriting) |
ALTER MATERIALIZED VIEW [db_name.]mv_name ENABLE|DISABLE REWRITE; |
|
Refresh MV (refreshing a materialized view) |
REFRESH MATERIALIZED VIEW [db_name.]mv_name; |
|
WASH OUT MATERIALIZED VIEW (eliminating a materialized view) --ALL: eliminates all views. --UNUSED_DAYS: eliminates views that have not been used for ${UNUSED_DAYS} days. This is the default policy. --RESERVE_QUANTITY_BY_VIEW_COUNT: reserves top ${RESERVE_QUANTITY_BY_VIEW_COUNT} frequently used views. --DROP_QUANTITY_BY_SPACE_CONSUMED: eliminates top ${DROP_QUANTITY_BY_SPACE_CONSUMED} views that occupy the most memory space. |
WASH OUT [ALL] MATERIALIZED VIEW [ USING [UNUSED_DAYS,] [RESERVE_QUANTITY_BY_VIEW_COUNT,] [DROP_QUANTITY_BY_SPACE_CONSUMED] ] |
Spark Startup Parameters of OmniMV
|
Startup Parameter |
Default Value |
Description |
|---|---|---|
|
spark.sql.omnimv.enable |
true |
Indicates whether to enable rewrite in OmniMV.
|
|
spark.sql.omnimv.show.length |
50 |
Maximum characters of the materialized view information printed by the show materialized views command. |
|
spark.sql.omnimv.default.datasource |
orc |
Storage format of materialized views. The options are orc and parquet. |
|
spark.sql.omnimv.logLevel |
DEBUG |
Log level for materialized views. The options are DEBUG, INFO, WARN, and ERROR. |
|
spark.sql.omnimv.log.enable |
true |
Indicates whether to export parsed SQL statements to event logs.
|
|
spark.sql.omnimv.metadata.path |
/omnimv/plugin_metadata |
HDFS path of materialized view metadata. |
|
spark.sql.omnimv.metadata.initbyquery.enable |
false |
Indicates whether to load only the materialized view metadata related to the table used by a query SQL statement. This parameter can be used to accelerate the loading.
|
|
spark.sql.omnimv.dbs |
No default value. Set it based on your requirements. |
Indicates whether to load only the materialized view metadata in specified databases. This parameter can be used to accelerate the loading. This parameter can be set to omnimv, omnimv1, or another desired value. |
|
spark.sql.omnimv.washout.automatic.enable |
false |
Indicates whether to enable automatic elimination of historical materialized views. The default policy is UNUSED_DAYS. true: enables automatic elimination of materialized views. false: disables automatic elimination of materialized views. |
|
spark.sql.omnimv.washout.unused.day |
30 |
UNUSED_DAYS policy (default). It eliminates materialized views that have not been used in the past 30 days. |
|
spark.sql.omnimv.washout.reserve.quantity.byViewCnt |
25 |
RESERVE_QUANTITY_BY_VIEW_COUNT policy. It retains only the top 25 views after sorting the materialized views by usage frequency in descending order, and eliminates the others. |
|
spark.sql.omnimv.washout.drop.quantity.bySpaceConsumed |
3 |
DROP_QUANTITY_BY_SPACE_CONSUMED policy. It eliminates the top 3 views after sorting the materialized views by used storage space in descending order. |
|
spark.sql.omnimv.washout.automatic.time.interval |
35 |
Number of days between two adjacent automatic eliminations of materialized views. |
|
spark.sql.omnimv.washout.automatic.checkTime.interval |
3600 |
Interval for automatic eliminations within a session, in seconds. |
|
spark.sql.omnimv.washout.automatic.view.quantity |
20 |
Minimum number of views that trigger an automatic elimination. |
|
spark.sql.omnimv.detect-rewrite-sqls.enable |
false |
Indicates whether to accelerate metadata loading only for rewritable SQL statements of materialized views. This can reduce the performance deterioration caused by the time-consuming metadata loading of SQL statements that do not take effect. true: enables acceleration of metadata loading only for rewritable SQL statements of materialized views. false: enables acceleration of metadata loading for all SQL statements. |
Kryo Parameters of OmniMV
When OmniMV uses Kryo to serialize materialized view metadata, the user's configuration may be incompatible with OmniMV. Therefore, a fixed configuration is used to avoid affecting Spark functions.
|
Parameter |
Fixed Value |
Description |
|---|---|---|
|
spark.kryo.unsafe |
false |
Does not use the unsafe mode of Kryo. |
|
spark.kryo.registrationRequired |
false |
It is not required to register all classes that need to be serialized with Kryo. |
|
spark.kryo.registrator |
"" |
Does not specify a custom Kryo registrar class. |
|
spark.kryo.classesToRegister |
"" |
Does not specify the list of classes to be registered. |
|
spark.kryo.pool |
true |
Enables the Kryo instance pool. |
|
spark.kryo.referenceTracking |
true |
Enables reference tracking. |
|
spark.kryoserializer.buffer |
64k |
Sets the initial size of the Kryo serialization buffer. |
|
spark.kryoserializer.buffer.max |
64m |
Sets the maximum size of the Kryo serialization buffer. |
Checking Whether the OmniMV Spark Engine Is Successfully Loaded
- Log in to the Spark SQL client and manually execute SQL statements (Method 1).
- Run the command of showing views. If the following information is displayed, the plugin is successfully loaded.
1SHOW MATERIALIZED VIEWS;

- Create a sample base table and a sample view.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
CREATE TABLE IF NOT EXISTS column_type( empid INT, deptno INT, locationid INT, booleantype BOOLEAN, bytetype BYTE, shorttype SHORT, integertype INT, longtype LONG, floattype FLOAT, doubletype DOUBLE, datetype DATE, timestamptype TIMESTAMP, stringtype STRING, decimaltype DECIMAL ); INSERT INTO TABLE column_type VALUES( 1,1,1,TRUE,1,1,1,1,1.0,1.0, DATE '2022-01-01', TIMESTAMP '2022-01-01', 'stringtype1',1.0 ); INSERT INTO TABLE column_type VALUES( 2,2,2,TRUE,2,2,2,2,2.0,2.0, DATE '2022-02-02', TIMESTAMP '2022-02-02', 'stringtype2',2.0 );

- Create a materialized view.
1 2 3
CREATE MATERIALIZED VIEW IF NOT EXISTS mv_create1 AS SELECT * FROM column_type;

- Use the EXPLAIN statement to check whether the execution plan of the query is rewritten.
1 2
EXPLAIN SELECT * FROM column_type;
