我要评分
获取效率
正确性
完整性
易理解

Constraints

Understand the OmniOperator usage restrictions before configuring the feature.

  • The user-defined function (UDF) plugin supports only simple UDFs. It is used to execute UDFs written based on the Hive UDF framework.
  • OmniOperator supports 64-bit and 128-bit Decimal data types. If the Decimal data exceeds 128 bits, an exception is thrown or Null is returned. In this case, an issue that does not match native behaviors of the engine may occur. For example, during SUM or AVG aggregation, if the intermediate result exceeds Decimal 128 bits, the engine native behaviors are normal, but OmniOperator throws an exception or returns Null based on the configuration. If AVG calculation is required for a field and the accumulated result may be too large, use another storage type such as Double.
  • Different loads require different memory configurations. For example, for a 3 TB TPC-DS dataset, the recommended SparkExtension configuration requires that off-heap memory be greater than or equal to 20 GB so that all the 99 SQL statements can be successfully executed. During the execution, "MEM_CAP_EXCEEDED" may be reported in logs, but the execution result is not affected. If the off-heap memory is insufficient, the SQL execution result may be incorrect.
  • The spill function is available for Sort, Window, and HashAgg operators but not for BroadcastHash Join, ShuffledHash Join, and SortMerge Join.
  • While Hive OmniOperator supports the 99 TPC-DS statements, it does not support q14, q72, or q89 because the native Hive may have problems when executing q14, q72, and q89.
  • When Hive OmniOperator is working on POWER expressions, there is a slight implementation difference between the C++ std:pow function and Java Math.pow function. As a result, the POWER expression implemented using C++ is different from the native POWER expression of Hive, but the relative precision error is not greater than 1e-15.
  • Spark OmniOperator supports the from_unixtime and unix_timestamp expressions.
    1. The time parsing policy spark.sql.legacy.timeParserPolicy must be EXCEPTION or CORRECTED, and cannot be LEGACY.
    2. For some improper parameter values (such as non-existent dates and invalid ultra-large timestamp values), the processing results of OmniRuntime are different from those of the native Spark.
    3. You can set spark.omni.sql.columnar.unixTimeFunc.enabled to false to roll back the two functions. That is, use the native functions to avoid the inconsistency described in 2.
  • When Hive OmniOperator is used in floating-point arithmetic, an issue that does not match native Hive behaviors may occur. For example, when dividing the floating-point number of 0.0, the native Hive returns Null, whereas OmniOperator returns Infinity, NaN, or Null.
  • CBO optimization is enabled by default for the Hive engine. Hive OmniOperator must have CBO optimization enabled, specifically, hive.cbo.enable cannot be set to false.
  • If SQL statements contain the Alter field attribute or use LOAD DATA to import .parq data, the native TableScan operator is recommended for the Hive engine.
  • When Spark OmniOperator performs expression codegen on a large number of columns (for example 500 columns) at the same time, the compilation overhead is greater than the OmniOperator acceleration effect. In this scenario, you are advised to use the native Spark.
  • Spark OmniOperator does not support comparison operators (<, <=, >, >=, !=, <>, =, ==, <=>) for Boolean data, and does not support <=> for any data type. If an incompatible operation exists during the execution, it is normal that the operator is rolled back. A rollback during a join operation on large tables causes heavy row-column conversion overhead and serious performance deterioration. Therefore, try avoiding any rollbacks.
    • When the data storage structure declared by Hive OmniOperator in the table does not match the actual storage structure and the GroupBy operator is consistent with bucketing parameters, the GroupBy operator may encounter a grouping exception in the native Hive. Therefore, to ensure that the declared storage structure matches the actual storage structure, use no bucketing policy when creating the table or run load data local inpath to import data.
    • When the sum result overflows, Hive OmniOperator may generate a result different from native behaviors of the engine. OmniOperator returns Null for users to perceive the overflow, whereas the native Hive returns an error value, which may cause misunderstanding.