Constraints
Understand the OmniOperator usage restrictions before configuring the feature.
Common Constraints
- OmniOperator supports 64-bit and 128-bit Decimal data types. If the Decimal data exceeds 128 bits, an exception is thrown or Null is returned. In this case, an issue that does not match open source behaviors of the engine may occur. For example, during SUM or AVG aggregation, if the intermediate result exceeds Decimal 128 bits, the open source behaviors are normal, but OmniOperator throws an exception or returns Null based on the configuration. If AVG calculation is required for a field and the accumulated result may be too large, use another storage type such as Double.
- Due to the precision issue of floating-point numbers and different execution sequences, OmniOperator may present different SUM and AVG operation results on the Double type. If you need an accurate result, consider using a data type with higher precision, such as Decimal.
- The spill function is available for Sort, Window, and HashAgg operators but not for BroadcastHash Join, ShuffledHash Join, and SortMerge Join.
Hive
- The user-defined function (
UDF ) plugin supports only simple UDFs. It is used to execute UDFs written based on the Hive UDF framework.
- While Hive OmniOperator supports the 99 TPC-DS statements, it does not support q14, q72, or q89 because the open source Hive may have problems when executing q14, q72, and q89.
- When Hive OmniOperator is working on POWER expressions, there is a slight implementation difference between the C++ std:pow function and Java Math.pow function. As a result, the POWER expression implemented using C++ is different from the open source POWER expression of Hive, but the relative precision error is not greater than 1e-15.
- When Hive OmniOperator is used in floating-point arithmetic, an issue that does not match open source Hive behaviors may occur. For example, when dividing the floating-point number of 0.0, the open source Hive returns Null, whereas OmniOperator returns Infinity, NaN, or Null.
- CBO optimization is enabled by default for the Hive engine. Hive OmniOperator must have CBO optimization enabled, specifically, hive.cbo.enable cannot be set to false.
- If SQL statements contain the Alter field attribute or use LOAD DATA to import .parq data, the open source TableScan operator is recommended for the Hive engine.
- When the data storage structure declared by Hive OmniOperator in the table does not match the actual storage structure and the GroupBy operator is consistent with bucketing parameters, the GroupBy operator may encounter a grouping exception in the open source Hive. Therefore, to ensure that the declared storage structure matches the actual storage structure, use no bucketing policy when creating the table or run load data local inpath to import data.
- When the sum result overflows, Hive OmniOperator may generate a result different from open source behaviors of the engine. OmniOperator returns Null for users to perceive the overflow, whereas the open source Hive returns an error value, which may cause misunderstanding.
Spark
- Different loads require different memory configurations. For example, for a 3 TB TPC-DS dataset, the recommended SparkExtension configuration requires that off-heap memory be greater than or equal to 20 GB so that all the 99 SQL statements can be successfully executed. During the execution, "MEM_CAP_EXCEEDED" may be reported in logs, but the execution result is not affected. If the off-heap memory is insufficient, the SQL execution result may be incorrect.
- Spark OmniOperator supports the from_unixtime and unix_timestamp expressions.
- The time parsing policy spark.sql.legacy.timeParserPolicy must be EXCEPTION or CORRECTED, and cannot be LEGACY.
- For some improper parameter values (such as non-existent dates and invalid ultra-large timestamp values), the processing results of OmniRuntime are different from those of the open source Spark.
- You can set spark.omni.sql.columnar.unixTimeFunc.enabled to false to roll back the two functions. That is, use the open source functions to avoid the inconsistency described in 2.
- When Spark OmniOperator performs expression codegen on a large number of columns (for example 500 columns) at the same time, the compilation overhead is greater than the OmniOperator acceleration effect. In this scenario, you are advised to use the open source Spark.
- OmniOperator does not support decimal128 CHAR or AVG function data or of Spark 3.4.3 or Spark 3.5.2. Such data may cause operation rollback during operation acceleration.
- OmniOperator does not support ORC write for Spark 3.4.3 or Spark 3.5.2.
- OmniOperator supports the row_number and rank functions in Spark 3.5.2. In the dense_rank scenario, operators are rolled back.
- Spark OmniOperator does not support comparison operators (<, <=, >, >=, !=, <>, =, ==, <=>) for Boolean data, and does not support <=> for any data type. If an incompatible operation exists during the execution, it is normal that the operator is rolled back. A rollback during a join operation on large tables causes heavy row-column conversion overhead and serious performance deterioration. Therefore, try avoiding any rollbacks.
Parent topic: Feature Overview