执行OmniOperator作业的时候,如果ORC/Parquet格式的Hive数据集设置了partition字段,并且partition字段为string、char或者varchar类型,其中内容包含1个或多个特殊字符,例如!#$%&()*+,-./:;<=>?@[\]^_`{|}~等,作业会执行失败,有如下错误日志,并且作业没有执行结果。
错误日志1:文件不存在
1 2 3 4 | ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.util.TaskCompletionListenerException: delete nullptr error for reader Previous exception in task: Can't open/user/hive/warehouse/tpcds_bin_ partitioned_decimal_orc_2.db/partition_null_varchar/c_varchar=7893456=bbb/000000_0. status code: 2, message: File does not exist: /user/hive/warehouse/tpcds_bin_partioned_decimal_orc_2.db/partition_null_varchar/c_varchar=7893456=bbb/000000_0 |
错误日志2:ORC格式下HDFS URL格式错误
1 2 3 4 | ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.util.TaskCompletionListenerException: delete nullptr error for reader Previous exception in task: Malformed URL: hdfs: //server1:9000/user/hive/warehouse/tpcds_bin_partitioned_decimal_orc_2.db/partition_null_varchar/c_varchar=3333|bbb/000000_0 |
错误日志3:Parquet格式下HDFS URL格式错误
1 2 3 | Previous exception in task: IOError: Invalid: Cannot parse URI: 'hdfs://server1:9000/user/hive/warehouse/mytest.db/partition_null_varchar/c_varchar=1233456|/000000_0' /home/code/arrow/cpp/src/arrow/filesystem/filesystem.cc:750 ParseFileSystemUri(uri_string) com.huawei.boostkit.spark.jni.ParquetColumnarBatchJniReader.initializeReader(Native Method) |
这是开源ORC/Arrow组件依赖的第三方库uriparser的一个BUG。