Installing the Spark/Hive UDF Plugin

To push down UDFs to the OmniData service, you need to install the UDF dependency package. The following uses huawei-udf as an example.

Upload huawei_udf.jar to HDFS.

hdfs dfs -mkdir -p /user/BICoreData/hive/fiudflib2/
hdfs dfs -put huawei_udf.jar /user/BICoreData/hive/fiudflib2/

Register UDFs with MetaStore before running the UDFs. There are many registration methods. This section uses isEmpty as an example.

set spark.sql.ndp.udf.whitelist=isEmpty;
CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';

Configure the Hive UDF plugin.
In the hive.properties configuration file of OmniData Hive UDFs, add the IsEmpty com.huawei.platform.bi.udf.common.IsEmptyUDF function. For details, see Adding the OmniData Hive UDF Plugin.

Push down the Spark Hive UDFs.

/usr/local/spark/bin/spark-sql  --driver-class-path '/opt/boostkit/*' --jars '/opt/boostkit/*' --conf 'spark.executor.extraClassPath=./*' --name IsEmptyUDF.sql --driver-memory 50G --driver-java-options -Dlog4j.configuration=file:../conf/log4j.properties --executor-memory 32G --num-executors 30 --executor-cores 18 --properties-file  tpch_query.conf  --database tpch_flat_orc_date_5 -f IsEmptyUDF.sql;

The command output is as follows:

The IsEmptyUDF.sql file has the following content:

set spark.sql.ndp.udf.whitelist=isEmpty;

CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';
select sum(l_extendedprice) as sum_base_price from lineitem where   !isEmpty(l_shipmode);

Parent topic: Using OmniData on Spark