No Result Is Generated When Fields of the Char Type in SQL Statements Are Filtered
Symptom
When the partial pushdown function is enabled, there is a possibility that no result is generated when fields of the char type in SQL statements are filtered.
Key Process and Cause Analysis
Native Spark's processing logic for data of the char type is inconsistent with that of Hive.
- If ORC and Parquet tables and data are created and queried on native Spark, no matter the Spark startup parameter spark.sql.orc.impl is set to hive or native, data of the char type is padded with spaces in the Spark read phase.
- If ORC and Parquet tables and data are created on Hive and queried on Spark, when spark.sql.orc.impl is set to hive, data of the char type is padded with spaces in the Spark read phase. When this parameter is set native, no space is padded. When spark.sql.orc.impl is set to hive, in the Parquet table, char data is not padded with spaces in the Spark read phase, while in the ORC table, char data is padded with spaces in the Spark read phase.
The logic of the partial pushdown function is to create tables and insert data on Hive and query data on Spark. Therefore, when tables are created and data are inserted on native Spark, problems may occur.
Conclusion and Solution
Create tables and insert data on Hive.
Parent topic: Feature Combination