Rate This Document
Findability
Accuracy
Completeness
Readability

(Optional) Installing the Spark/Hive UDF Plugin

To perform specific data processing operations, you can push down UDFs to the OmniData service. You need to install the UDF dependency package and configure the Hive UDF plugin. Upload the UDF JAR package based on your requirements.

To push down UDFs to the OmniData service, you need to install the UDF dependency package. The following uses huawei_udf as an example.

  1. Upload huawei_udf.jar to HDFS.
    1
    2
    hdfs dfs -mkdir -p /user/BICoreData/hive/fiudflib2/
    hdfs dfs -put huawei_udf.jar /user/BICoreData/hive/fiudflib2/
    
  2. Register UDFs with MetaStore before running the UDFs. There are many registration methods. This section uses isEmpty as an example.
    1
    2
    set spark.sql.ndp.udf.whitelist=isEmpty;
    CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';
    
  3. Configure the Hive UDF plugin.

    In the hive.properties configuration file of OmniData Hive UDFs, add the IsEmpty com.huawei.platform.bi.udf.common.IsEmptyUDF function. For details, see (Optional) Installing the Hive UDF Plugin.

  4. Push down the Spark Hive UDFs.
    1
    /usr/local/spark/bin/spark-sql  --driver-class-path '/opt/boostkit/*' --jars '/opt/boostkit/*' --conf 'spark.executor.extraClassPath=./*' --name IsEmptyUDF.sql --driver-memory 50G --driver-java-options -Dlog4j.configuration=file:../conf/log4j.properties --executor-memory 32G --num-executors 30 --executor-cores 18 --properties-file  tpch_query.conf  --database tpch_flat_orc_date_5 -f IsEmptyUDF.sql;
    

    The command output is as follows:

    The IsEmptyUDF.sql file has the following content:

    1
    2
    3
    4
    set spark.sql.ndp.udf.whitelist=isEmpty;
    
    CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';
    select sum(l_extendedprice) as sum_base_price from lineitem where   !isEmpty(l_shipmode);