Constraints
Understand the OmniOperator usage restrictions before configuring the feature.
- The user-defined function (
UDF ) plugin supports only simple UDFs. It is used to execute UDFs written based on the Hive UDF framework. - OmniOperator supports 64-bit and 128-bit Decimal data types. If the Decimal data exceeds 128 bits, an exception is thrown or Null is returned. In this case, an issue that does not match native behaviors of the engine may occur. For example, during SUM or AVG aggregation, if the intermediate result exceeds Decimal 128 bits, the engine native behaviors are normal, but OmniOperator throws an exception or returns Null based on the configuration. If AVG calculation is required for a field and the accumulated result may be too large, use another storage type such as Double.
- Different loads require different memory configurations. For example, for a TPC-DS 3 TB dataset, the recommended SparkExtension configuration requires that off-heap memory be greater than or equal to 20 GB so that all the 99 SQL statements can be successfully executed. During the execution, "MEM_CAP_EXCEEDED" may be reported in logs, but the execution result is not affected. If the off-heap memory is insufficient, the SQL execution result may be incorrect.
- The spill function is available for Sort, Window, and HashAgg operators but not for BroadcastHash Join, ShuffledHash Join, and SortMerge Join.
- While Hive OmniOperator supports the 99 TPC-DS statements, it does not support q14, q72, or q89 because the native Hive may have problems when executing q14, q72, and q89.
- When Hive OmniOperator is used in floating-point arithmetic, an issue that does not match native Hive behaviors may occur. For example, when dividing the floating-point number of 0.0, the native Hive returns Null, whereas OmniOperator returns Infinity, NaN, or Null.
- CBO optimization is enabled by default for the Hive engine. Hive OmniOperator must have CBO optimization enabled, specifically, hive.cbo.enable cannot be set to false.
- When Spark OmniOperator performs expression codegen on a large number of columns (for example 500 columns) at the same time, the compilation overhead is greater than the OmniOperator acceleration effect. In this scenario, you are advised to use the native Spark.
- Spark OmniOperator does not support comparison operators (<, <=, >, >=, !=, <>, =, ==, <=>) for Boolean data, and does not support <=> for any data type. If an incompatible operation exists during the execution, it is normal that the operator is rolled back. A rollback during a join operation on large tables causes heavy row-column conversion overhead and serious performance deterioration. Therefore, try avoiding any rollbacks.
- When working on Parquet data, Spark OmniOperator does not allow changing the table schema after a table is created.
- When the data storage structure declared by Hive OmniOperator in the table does not match the actual storage structure and the GroupBy operator is consistent with bucketing parameters, the GroupBy operator may encounter a grouping exception in the native Hive. Therefore, to ensure that the declared storage structure matches the actual storage structure, use no bucketing policy when creating the table or run load data local inpath to import data.
- When the sum result overflows, Hive OmniOperator may generate a result different from native behaviors of the engine. OmniOperator returns Null for users to perceive the overflow, whereas the native Hive returns an error value, which may cause misunderstanding.
Parent topic: Feature Overview