Installing the Spark/Hive UDF Plugin
To push down UDFs to the OmniData service, you need to install the UDF dependency package. The following uses huawei-udf as an example.
- Upload huawei_udf.jar to HDFS.
hdfs dfs -mkdir -p /user/BICoreData/hive/fiudflib2/ hdfs dfs -put huawei_udf.jar /user/BICoreData/hive/fiudflib2/
- Register UDFs with MetaStore before running the UDFs. There are many registration methods. This section uses isEmpty as an example.
set spark.sql.ndp.udf.whitelist=isEmpty; CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';
- Configure the Hive UDF plugin.
In the hive.properties configuration file of OmniData Hive UDFs, add the IsEmpty com.huawei.platform.bi.udf.common.IsEmptyUDF function. For details, see Adding the OmniData Hive UDF Plugin.
- Push down the Spark Hive UDFs.
/usr/local/spark/bin/spark-sql --driver-class-path '/opt/boostkit/*' --jars '/opt/boostkit/*' --conf 'spark.executor.extraClassPath=./*' --name IsEmptyUDF.sql --driver-memory 50G --driver-java-options -Dlog4j.configuration=file:../conf/log4j.properties --executor-memory 32G --num-executors 30 --executor-cores 18 --properties-file tpch_query.conf --database tpch_flat_orc_date_5 -f IsEmptyUDF.sql;
The command output is as follows:

The IsEmptyUDF.sql file has the following content:
set spark.sql.ndp.udf.whitelist=isEmpty; CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar'; select sum(l_extendedprice) as sum_base_price from lineitem where !isEmpty(l_shipmode);
Parent topic: Using OmniData on Spark