Rate This Document
Findability
Accuracy
Completeness
Readability

Constraints

Understand the OmniRuntime usage restrictions before configuring each of the features.

OmniData

  • If an out-of-bounds error occurs during numeric calculation, OmniData throws an out-of-bounds exception.
  • When comparing character strings, ensure that constants are enclosed in single quotation marks. Example: select * from table where char = '123'
  • The decimal (38,38) data type is not supported.
  • The timestamp data type is not supported. It is processed in the native calculation approach.
  • Operator pushdown is unavailable for transaction tables, which are processed in the native calculation approach.
  • Operator pushdown is unavailable for bucket tables, which are processed in the native calculation approach.
  • Ensure that the analyze information of a data table is the latest. Otherwise, the pushdown selectivity is inaccurate.

OmniOperator

  • The user-defined function (UDF) plugin supports only simple UDFs. It is used to execute UDFs written based on the Hive UDF framework.
  • Currently, 64-bit and 128-bit Decimal data types are supported. If the Decimal data exceeds 128 bits, an exception is thrown or null is returned. In this case, an issue that does not match native behaviors of the engine may occur. For example, during SUM or AVG aggregation, if the intermediate result exceeds Decimal 128 bits, the engine native behaviors are normal, but OmniOperator throws an exception or returns null based on the configuration. If AVG calculation is required for a field and the accumulated result may be too large, use other storage types such as Double.
  • Different loads require different memory configurations. For example, for a TPC-DS 3 TB dataset, the recommended Spark Extension configuration requires that off-heap memory be greater than or equal to 30 GB so that all the 99 SQL statements can be successfully executed. During the execution, "MEM_CAP_EXCEEDED" may be reported in logs, but the execution result is not affected. If the off-heap memory is insufficient, the SQL execution result may be incorrect.
  • In security cluster mode, data in the ORC format cannot be read in native mode.

Scenarios and Rules in Which OmniMV Does Not Support Rewriting (or Extraction of Materialized Views)

  • For Spark, Table 1 describes the scenarios and rules in which OmniMV does not support rewriting or extraction of materialized views.
  • For ClickHouse, Table 2 describes the scenarios and rules in which OmniMV does not support rewriting or extraction of materialized views.
Table 1 Scenarios and rules for Spark SQL

Scenario

Rule

Description

Rewriting is not supported.

Basic rules

  • When required data is queried, if the view condition does not match the query condition, rewriting is not supported. For example, when the query condition is c1>=2 and the view condition is c1>2, rewriting is not supported.
  • The In and Like statements require that the query and view conditions be completely matched. If not, rewriting is not supported. For example, when the query condition is in(2,3) and the view condition is in(1,2,3), rewriting is not supported.
  • When a view has been used in the SQL query statements, rewriting is not supported.

Rewriting is not supported.

Join rules

  • Only the Inner-Join type supports rewriting. Other Join types does not support rewriting in most scenarios. For example, the Outer-Join type can be rewritten only when the query condition is the same as the view condition.
  • isValidPlan is used to check whether the logical plan tree of the currently matched query and view meets the minimum unit requirement, which requires that only the following operators be included. If the requirement is not met, rewriting is not supported.
    • LogicalRelation
    • HiveTableRelation
    • Project
    • Filter
    • Join
    • SubqueryAlias

Rewriting is not supported.

Aggregate rules

  • The root node of the logical plan tree in the view must be the Aggregate operator. For example, the root nodes of order by and having are Sort and Filter respectively, and these views do not support rewriting.
  • The query must match the logical plan tree of the view. The root nodes on both sides must be the Aggregate operator. Then, isValidPlan is used to check whether the subtree meets the minimum unit requirement. If the subtree does not meet the requirement, rewriting is not supported.
  • If the query contains aggregate functions that do not exist in the view, rewriting is not supported.
  • The group by field in the query is a subset of the group by field in the view. Only the following aggregate functions (without distinct, except for min and max) can be rolled up. If the condition is not met, rewriting is not supported.
    • SUM
    • MIN
    • MAX
    • COUNT

Materialized views cannot be extracted.

The query logic is too simple or complex.

  • Materialized views cannot be extracted from single-table queries.
  • If multi-layer nested subqueries exist in a query, views can be extracted only from the innermost subqueries, but not from the outer subqueries.
  • Materialized views cannot be extracted from subqueries that contain temporary tables.
Table 2 Scenarios and rules on ClickHouse

Scenario

Rule

Description

Rewriting is not supported.

Engine-defined rules

  • WHERE must be a subset of GROUP BY in the PROJECTION definition.
  • GROUP BY must be a subset of GROUP BY in the PROJECTION definition.
  • SELECT must be a subset of SELECT in the PROJECTION definition.
  • When multiple projections are matched, the one that reads the least partitions is selected.
  • The number of returned data rows is less than the total number of data rows in the base table.
  • The query covers more than half of the partitions.

Materialized views cannot be extracted.

The query logic is too simple or complex.

A subquery for which PROJECTION can be created must be a subquery on a single table. That is, the FROM clause contains only one table and must contain the GROUP BY clause. In other scenarios, PROJECTION cannot be extracted.

OmniShuffle

The OmniShuffle feature cannot be used in public cloud and multi-tenant scenarios, and cannot be directly accessed from the Internet.

OmniAdvisor

  • When parsing Spark task logs, you need to enable the history server. When parsing Tez task logs, you need to enable the timeline server.
  • SQL information in task logs cannot be encrypted. Otherwise, SQL information cannot be parsed and subsequent sampling tuning is impossible.
  • When running a Spark task, you need to add --name $name to the parameter list to specify the task name. When running a Hive Tez task, you need to add --hiveconf hive.session.id=$name to specify the Tez task name. Generally, the task name is used as the unique identifier for parameter sampling.
  • Currently, the end-to-end parameter tuning function is available only for Spark and Hive Tez because only task logs of Spark and Hive Tez can be parsed.
  • OmniAdvisor works only on Spark and Hive Tez SQL tasks but not on Spark App tasks.
  • The parameter list of Spark can be configured in the spark_config.xml file and that of Hive Tez can be configured in hive_config.xml. Currently, only parameters of the Int, Float, and Boolean types are supported while JVM and OS parameters are not supported.

OmniHBaseGSI

Application Scenarios

  • Index tables do not support disaster recovery.
  • Rolling upgrade is not supported for index data.
  • Do not perform the DISABLE, DROP, MODIFY, or TRUNCATE operation on index tables or index metadata tables.
  • Data definition language (DDL) operations allowed on indexes include modifying index status, deleting indexes, and creating indexes, but not modifying index definitions. If you need to modify an index definition, delete it and create an index again.
  • Ensure that the system time of each HBase node is synchronized.
  • When a client is writing data, the data may not be synchronized in real time between the data table and index table. However, after a write success is returned to the client, the data is visible in both the data table and index table.

Creating indexes

  • An index name must be a regular expression and does not support other characters. Regular expressions support the characters of [a-zA-Z_0-9-.].
  • The associated data table must exist. The name of the index table to be created must be unique.
  • The index table does not support multiple versions. Indexes cannot be created on data tables with multiple versions (VERSION > 1), and the version of the index table is 1.
  • Do not create too many indexes for a data table. A large number of indexes will increase the storage cost and prolong the write time. Therefore, it is recommended that the number of indexes in a data table be less than or equal to 5. If more than five indexes are required, add the hbase.gsi.max.index.count.per.table parameter, set it to a value greater than 5, and restart HBase for the parameter setting to take effect.
  • An index name cannot have more than 18 characters.
  • Indexes cannot be created for index tables. Multiple indexes cannot be created in nested mode. Index tables are used only to accelerate query and do not provide data table functions.
  • Dot not create an index that can be covered by an existing one.

    When you create an index, if it can be covered by an existing index (that is, it is a pre-order subset of an existing index), the index cannot be created. That is because indexes with duplicate functions cause storage waste. For example, in the following operation, index 3 can be created but index 2 cannot.

    Create a data table: create 't1','cf1'

    Create index 1: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx1=>cf1:[q1],[q2]'

    Create index 2: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx2=>cf1:[q1]'

    Create index 3: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx3=>cf1:[q2]'

  • Each index must have a unique name across all data tables.
  • When creating an index, ensure that the region server node is stable and do not bring the node offline.
  • If an index table associated with a data table is being created (in the BUILDING state), writing data to the data table will fail. Therefore, do not write data to a data table during index creation.

Writing index data

  • Only the Put/Delete API can be used to generate index data. When data is written to a data table in other methods (such as Increment, Append, and Bulkload), no index will be generated.
  • When the index column data is of the String type, do not write the special invisible characters \x00 and \x01.
  • Timestamps cannot be specified when writing data to covering or index columns in the primary table.
  • Data timestamps of the index table do not inherit those of the primary table.

Querying indexes

  • When an index is used to accelerate the query, the index must be in the ACTIVE state.
  • When an index table is used to accelerate the data table query, only the scan query operation of SingleColumnValueFilter is supported, and the specified filter column must include an index column.
  • Only one index is hit in a query. Multi-index joint query is not supported.
  • If the query result hits a secondary index, the sequence of the query result is different from that of hitting the primary table directly.
  • When a secondary index is hit, SingleColumnValueFilter has the following restrictions: When any column is filtered, latestVersionOnly is true and cannot be changed. When an index column is filtered, filterIfMissing is true by default and cannot be changed.
  • When an index table is used to accelerate data table query, the caching, limit, and cacheblocks attributes can be set for a scan operation. If the reversed, raw, startRow, and stopRow attributes are set, the index table will not be used to accelerate data table query. Other attributes do not take effect.