(Optional) Installing the UDF Plugin
You need to perform the operations described in this section only when your service scenario involves user-defined functions (UDFs). Perform the following plugin operation operations only on the management node. UDFs cannot be accelerated when OmniOperator is enabled for Gluten.
This plugin supports only the HiveSimpleUDF type (simple UDFs written based on the Hive UDF framework). HiveSimpleUDF is built on the Hive UDF framework and is used to extend the set of functions available in Hive queries. Spark supports Hive UDF interfaces. Therefore, HiveSimpleUDF can be directly used in a Spark environment.
OmniOperator accelerates UDFs through row-by-row or batch processing. You can switch between these two processing modes by modifying the configuration file.
Prerequisites
- Ensure that your UDFs are simple UDFs implemented based on the Hive UDF framework.
- To use OmniOperator to accelerate UDFs, you need to provide related JAR packages and configuration files, including the following files:
- udf.zip: contains the class files of all UDFs.
- conf.zip: contains the configuration files on which the UDFs depend.
- udf.properties: used to configure OmniOperator to accelerate UDFs.
Using udfName1 and udfName2 as examples, the udf.properties file has the following format:
1 2
udfName1 com.huawei.udf.UdfName1 udfName2 com.huawei.udf.UdfName2
Installing the UDF Plugin Row-by-Row Processing
- Create an /opt/omni-operator/hive-udf directory on the management node.
1mkdir /opt/omni-operator/hive-udf - Upload the udf.zip and conf.zip packages to the /opt/omni-operator/hive-udf directory on the management node.
You can customize the example names of the udf.zip and conf.zip packages based on your service requirements.
- Extract the files.
1 2 3 4 5
cd /opt/omni-operator/hive-udf unzip udf.zip rm -f udf.zip unzip conf.zip rm -f conf.zip
- Modify the /opt/omni-operator/conf/omni.conf file.
- Open the configuration file.
1vi /opt/omni-operator/conf/omni.conf - Press i to enter the insert mode and add the following UDF configuration.
1 2 3 4 5 6 7
# <----UDF properties----> # false indicates expression row-by-row processing and true indicates expression bath processing. enableBatchExprEvaluate=false # UDF trustlist file path hiveUdfPropertyFilePath=./hive-udf/udf.properties # Directory of the Hive UDF JAR package hiveUdfDir=./hive-udf/udf
The directory must start with a period (.). When OmniOperator is running, it reads the value of the OMNI_HOME environment variable and uses this value to replace the period (.).
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the configuration file.
- Update the environment variable.
- Open the ~/.bashrc file.
1vi ~/.bashrc - Press i to enter the insert mode and add LD_LIBRARY_PATH to update environment variables.
1export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${JAVA_HOME}/jre/lib/aarch64/server - Press Esc, type :wq!, and press Enter to save the file and exit.
- Make the updated environment variable take effect.
1source ~/.bashrc
- Open the ~/.bashrc file.
Installing the UDF Plugin Batch Processing
After installing row-by-row processing on the management node, modify the /opt/omni-operator/conf/omni.conf file to support batch processing.
- Open the file.
1vi /opt/omni-operator/conf/omni.conf - Press i to enter the insert mode, find the following statements, and modify them.
1enableBatchExprEvaluate=true
- Press Esc, type :wq!, and press Enter to save the file and exit.