Constraints
Understand the OmniRuntime usage restrictions before configuring each of the features.
OmniData
- If an out-of-bounds error occurs during numeric calculation, OmniData throws an out-of-bounds exception.
- When comparing character strings, ensure that constants are enclosed in single quotation marks. Example: select * from table where char = '123'
- The decimal (38,38) data type is not supported.
- The timestamp data type is not supported. It is processed in the native calculation approach.
- Operator pushdown is unavailable for transaction tables, which are processed in the native calculation approach.
- Operator pushdown is unavailable for bucket tables, which are processed in the native calculation approach.
- Ensure that the analyze information of a data table is the latest. Otherwise, the pushdown selectivity is inaccurate.
OmniOperator
- The
user-defined function (UDF) plugin supports only simple UDFs. It is used to execute UDFs written based on the Hive UDF framework. - Currently, 64-bit and 128-bit Decimal data types are supported. If the Decimal data exceeds 128 bits, an exception is thrown or null is returned. In this case, an issue that does not match native behaviors of the engine may occur. For example, during SUM or AVG aggregation, if the intermediate result exceeds Decimal 128 bits, the engine native behaviors are normal, but OmniOperator throws an exception or returns null based on the configuration. If AVG calculation is required for a field and the accumulated result may be too large, use other storage types such as Double.
- Different loads require different memory configurations. For example, for a TPC-DS 3 TB dataset, the recommended Spark Extension configuration requires that off-heap memory be greater than or equal to 30 GB so that all the 99 SQL statements can be successfully executed. During the execution, "MEM_CAP_EXCEEDED" may be reported in logs, but the execution result is not affected. If the off-heap memory is insufficient, the SQL execution result may be incorrect.
- In security cluster mode, data in the ORC format cannot be read in native mode.
Scenarios and Rules in Which OmniMV Does Not Support Rewriting (or Extraction of Materialized Views)
- For Spark, Table 1 describes the scenarios and rules in which OmniMV does not support rewriting or extraction of materialized views.
- For ClickHouse, Table 2 describes the scenarios and rules in which OmniMV does not support rewriting or extraction of materialized views.
Scenario |
Rule |
Description |
|---|---|---|
Rewriting is not supported. |
Basic rules |
|
Rewriting is not supported. |
Join rules |
|
Rewriting is not supported. |
Aggregate rules |
|
Materialized views cannot be extracted. |
The query logic is too simple or complex. |
|
Scenario |
Rule |
Description |
|---|---|---|
Rewriting is not supported. |
Engine-defined rules |
|
Materialized views cannot be extracted. |
The query logic is too simple or complex. |
A subquery for which PROJECTION can be created must be a subquery on a single table. That is, the FROM clause contains only one table and must contain the GROUP BY clause. In other scenarios, PROJECTION cannot be extracted. |
OmniShuffle
The OmniShuffle feature cannot be used in public cloud and multi-tenant scenarios, and cannot be directly accessed from the Internet.
OmniAdvisor
- When parsing Spark task logs, you need to enable the history server. When parsing Tez task logs, you need to enable the timeline server.
- SQL information in task logs cannot be encrypted. Otherwise, SQL information cannot be parsed and subsequent sampling tuning is impossible.
- When running a Spark task, you need to add --name $name to the parameter list to specify the task name. When running a Hive Tez task, you need to add --hiveconf hive.session.id=$name to specify the Tez task name. Generally, the task name is used as the unique identifier for parameter sampling.
- Currently, the end-to-end parameter tuning function is available only for Spark and Hive Tez because only task logs of Spark and Hive Tez can be parsed.
- OmniAdvisor works only on Spark and Hive Tez SQL tasks but not on Spark App tasks.
- The parameter list of Spark can be configured in the spark_config.xml file and that of Hive Tez can be configured in hive_config.xml. Currently, only parameters of the Int, Float, and Boolean types are supported while JVM and OS parameters are not supported.
OmniHBaseGSI
Application Scenarios
- Index tables do not support disaster recovery.
- Rolling upgrade is not supported for index data.
- Do not perform the DISABLE, DROP, MODIFY, or TRUNCATE operation on index tables or index metadata tables.
- Data definition language (DDL) operations allowed on indexes include modifying index status, deleting indexes, and creating indexes, but not modifying index definitions. If you need to modify an index definition, delete it and create an index again.
- Ensure that the system time of each HBase node is synchronized.
- When a client is writing data, the data may not be synchronized in real time between the data table and index table. However, after a write success is returned to the client, the data is visible in both the data table and index table.
Creating indexes
- An index name must be a regular expression and does not support other characters. Regular expressions support the characters of [a-zA-Z_0-9-.].
- The associated data table must exist. The name of the index table to be created must be unique.
- The index table does not support multiple versions. Indexes cannot be created on data tables with multiple versions (VERSION > 1), and the version of the index table is 1.
- Do not create too many indexes for a data table. A large number of indexes will increase the storage cost and prolong the write time. Therefore, it is recommended that the number of indexes in a data table be less than or equal to 5. If more than five indexes are required, add the hbase.gsi.max.index.count.per.table parameter, set it to a value greater than 5, and restart HBase for the parameter setting to take effect.
- An index name cannot have more than 18 characters.
- Indexes cannot be created for index tables. Multiple indexes cannot be created in nested mode. Index tables are used only to accelerate query and do not provide data table functions.
- Dot not create an index that can be covered by an existing one.
When you create an index, if it can be covered by an existing index (that is, it is a pre-order subset of an existing index), the index cannot be created. That is because indexes with duplicate functions cause storage waste. For example, in the following operation, index 3 can be created but index 2 cannot.
Create a data table: create 't1','cf1'
Create index 1: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx1=>cf1:[q1],[q2]'
Create index 2: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx2=>cf1:[q1]'
Create index 3: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx3=>cf1:[q2]'
- Each index must have a unique name across all data tables.
- When creating an index, ensure that the region server node is stable and do not bring the node offline.
- If an index table associated with a data table is being created (in the BUILDING state), writing data to the data table will fail. Therefore, do not write data to a data table during index creation.
Writing index data
- Only the Put/Delete API can be used to generate index data. When data is written to a data table in other methods (such as Increment, Append, and Bulkload), no index will be generated.
- When the index column data is of the String type, do not write the special invisible characters \x00 and \x01.
- Timestamps cannot be specified when writing data to covering or index columns in the primary table.
- Data timestamps of the index table do not inherit those of the primary table.
Querying indexes
- When an index is used to accelerate the query, the index must be in the ACTIVE state.
- When an index table is used to accelerate the data table query, only the scan query operation of SingleColumnValueFilter is supported, and the specified filter column must include an index column.
- Only one index is hit in a query. Multi-index joint query is not supported.
- If the query result hits a secondary index, the sequence of the query result is different from that of hitting the primary table directly.
- When a secondary index is hit, SingleColumnValueFilter has the following restrictions: When any column is filtered, latestVersionOnly is true and cannot be changed. When an index column is filtered, filterIfMissing is true by default and cannot be changed.
- When an index table is used to accelerate data table query, the caching, limit, and cacheblocks attributes can be set for a scan operation. If the reversed, raw, startRow, and stopRow attributes are set, the index table will not be used to accelerate data table query. Other attributes do not take effect.