(Optional) Installing the UDF Plugin

You need to perform the operations described in this section only when your service scenario involves user-defined functions (UDFs). Perform the following plugin operation operations only on the management node. UDFs cannot be accelerated when OmniOperator is enabled for Gluten.

This plugin supports only the HiveSimpleUDF type (simple UDFs written based on the Hive UDF framework). HiveSimpleUDF is built on the Hive UDF framework and is used to extend the set of functions available in Hive queries. Spark supports Hive UDF interfaces. Therefore, HiveSimpleUDF can be directly used in a Spark environment.

OmniOperator accelerates UDFs through row-by-row or batch processing. You can switch between these two processing modes by modifying the configuration file.

Prerequisites

Ensure that your UDFs are simple UDFs implemented based on the Hive UDF framework.
To use OmniOperator to accelerate UDFs, you need to provide related JAR packages and configuration files, including the following files:
udf.zip: contains the class files of all UDFs.
conf.zip: contains the configuration files on which the UDFs depend.
udf.properties: used to configure OmniOperator to accelerate UDFs.
Using udfName1 and udfName2 as examples, the udf.properties file has the following format:
1 2
udfName1 com.huawei.udf.UdfName1 udfName2 com.huawei.udf.UdfName2

Installing the UDF Plugin Row-by-Row Processing

Create an /opt/omni-operator/hive-udf directory on the management node.
1
mkdir /opt/omni-operator/hive-udf
Upload the udf.zip and conf.zip packages to the /opt/omni-operator/hive-udf directory on the management node.

You can customize the example names of the udf.zip and conf.zip packages based on your service requirements.

Extract the files.

cd /opt/omni-operator/hive-udf
unzip udf.zip
rm -f udf.zip
unzip conf.zip
rm -f conf.zip

Modify the /opt/omni-operator/conf/omni.conf file.

Open the configuration file.
1
vi /opt/omni-operator/conf/omni.conf

Press i to enter the insert mode and add the following UDF configuration.

# <----UDF properties---->
# false indicates expression row-by-row processing and true indicates expression bath processing.
enableBatchExprEvaluate=false
# UDF trustlist file path
hiveUdfPropertyFilePath=./hive-udf/udf.properties
# Directory of the Hive UDF JAR package
hiveUdfDir=./hive-udf/udf

The directory must start with a period (.). When OmniOperator is running, it reads the value of the OMNI_HOME environment variable and uses this value to replace the period (.).

Press Esc, type :wq!, and press Enter to save the file and exit.

Update the environment variable.
1. Open the ~/.bashrc file.
  1
  vi ~/.bashrc
2. Press i to enter the insert mode and add LD_LIBRARY_PATH to update environment variables.
  1
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${JAVA_HOME}/jre/lib/aarch64/server
3. Press Esc, type :wq!, and press Enter to save the file and exit.
4. Make the updated environment variable take effect.
  1
  source ~/.bashrc

Installing the UDF Plugin Batch Processing

After installing row-by-row processing on the management node, modify the /opt/omni-operator/conf/omni.conf file to support batch processing.

Open the file.
1
vi /opt/omni-operator/conf/omni.conf
Press i to enter the insert mode, find the following statements, and modify them.
1
enableBatchExprEvaluate=true
Press Esc, type :wq!, and press Enter to save the file and exit.

Parent topic: Installing the Feature