我要评分
获取效率
正确性
完整性
易理解

Constraints

Operator Acceleration Constraints

  1. Currently, the UDF plugin supports only simple UDFs. It is used to execute UDFs compiled based on the Hive UDF framework.
  2. Currently, 64-bit and 128-bit Decimal data types are supported. If the Decimal data exceeds 128 bits, an exception is thrown or null is returned. In this case, an issue that does not match native behaviors of the engine may occur. For example, during SUM or AVG aggregation, if the intermediate result exceeds Decimal 128 bits, the engine native behaviors are normal, but OmniOperator throws an exception or returns null based on the configuration. If AVG calculation is required for a field and the accumulated result may be too large, use other storage types such as Double.
  3. Different loads require different memory configurations. For example, for a TPC-DS 3 TB dataset, the recommended SparkExtension configuration requires that off-heap memory be greater than or equal to 30 GB so that all the 99 SQL statements can be successfully executed. During the execution, "MEM_CAP_EXCEEDED" may be reported in logs, but the execution result is not affected. If the off-heap memory is insufficient, the SQL execution result may be incorrect.

Scenarios Where Rewriting Is Not Supported or Materialized Views Cannot Be Extracted

For the Spark engine, the scenarios and rules are listed in Table 1.

For the ClickHouse engine, the scenarios and rules are listed in Table 2.

Table 1 Scenarios and rules on Spark

Scenario

Rule

Description

Rewriting is not supported.

Basic rules

  • When required data is queried, if the view condition does not match the query condition, rewriting is not supported. For example, when the query condition is c1>=2 and the view condition is c1>2, rewriting is not supported.
  • The In and Like statements require that the query and view conditions be completely matched. If not, rewriting is not supported. For example, when the query condition is in(2,3) and the view condition is in(1,2,3), rewriting is not supported.
  • When a view has been used in the SQL query statements, rewriting is not supported.

Rewriting is not supported.

Join rules

  • Only the Inner-Join type supports rewriting. Other Join types does not support rewriting in most scenarios. For example, the Outer-Join type can be rewritten only when the query condition is exactly the same as the view condition.
  • isValidPlan is used to check whether the logical plan tree of the currently matched query and view meets the minimum unit requirement, which requires that only the following operators be included. If the requirement is not met, rewriting is not supported.
    • LogicalRelation
    • HiveTableRelation
    • Project
    • Filter
    • Join
    • SubqueryAlias

Rewriting is not supported.

Aggregate rules

  • The root node of the logical plan tree in the view must be the Aggregate operator. For example, the root nodes of order by and having are Sort and Filter respectively, and these views do not support rewriting.
  • The query must match the logical plan tree of the view. The root nodes on both sides must be the Aggregate operator. Then, isValidPlan is used to check whether the subtree meets the minimum unit requirement. If the subtree does not meet the requirement, rewriting is not supported.
  • If the query contains aggregate functions that do not exist in the view, rewriting is not supported.
  • The group by field in the query is a subset of the group by field in the view. Only the following aggregate functions (without distinct, except for min and max) can be rolled up. If the condition is not met, rewriting is not supported.
    • sum
    • min
    • max
    • count

Materialized views cannot be extracted.

The query logic is too simple or complex.

  • Materialized views cannot be extracted from single-table queries.
  • If multi-layer nested subqueries exist in a query, views can be extracted only from the innermost subqueries, but not from the outer subqueries.
  • Materialized views cannot be extracted from subqueries that contain temporary tables.
Table 2 Scenarios and rules on ClickHouse

Scenario

Rule

Description

Rewriting is not supported.

Engine-defined rules

  • Where must be a subset of GROUP BY in the PROJECTION definition.
  • GROUP BY must be a subset of GROUP BY in the PROJECTION definition.
  • SELECT must be a subset of SELECT in the PROJECTION definition.
  • When multiple PROJECTIONs are matched, the one that reads the least partitions is selected.
  • The number of returned data rows is less than the total number of data rows in the base table.
  • The query covers more than half of the partitions.

Materialized views cannot be extracted.

The query logic is too simple or complex.

A subquery for which Projection can be created must be a subquery on a single table. That is, the FROM clause contains only one table and must contain the group by clause. In other scenarios, Projection cannot be extracted.