Executing Hive Engine Services
Use the hive command for operator pushdown on Hive.
In this example, a non-partition TPC-H table with 1 TB data is used as the test table. The test SQL statement is tpch-sql6.
Table 1 lists the related table information.
The procedure is as follows:
- Configure the tpch_ndp.setting file. This file can be stored in a user-defined directory. In this example, the file is stored in the /opt/hive directory.
1 2
cd /opt/hive vim tpch_ndp.setting
- Add the following settings to tpch_ndp.setting:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
set hive.execution.engine=tez; set hive.cbo.enable=true; set hive.tez.container.size=6144; set hive.mapjoin.hybridgrace.hashtable=false; set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true; set omnidata.hive.enabled=true; set omnidata.hive.filter.selectivity.enabled=true; set omnidata.hive.filter.selectivity=1.0; set omnidata.hive.table.size.threshold=10240; set omnidata.hive.zookeeper.quorum.server=agent1:2181,agent2:2181,agent3:2181; set omnidata.hive.zookeeper.status.node=/sdi/status; set omnidata.hive.zookeeper.conf.path=/usr/local/zookeeper/conf; set omnidata.hive.group.optimized.enabled=true; set omnidata.hive.reduce.optimized.enabled=true;
You can also run the set command to set the preceding parameters on Hive.
Table 2 describes the parameters for pushing down new operators.
Table 2 Operator pushdown parameters Parameter
Default Value
Description
omnidata.hive.enabled
false
Enables operator pushdown.
omnidata.hive.filter.selectivity.enabled
true
Enables filter selectivity to determine whether to perform operator pushdown.
The benefits of operator pushdown are related to the amount of filtered data. The more data is filtered, the more significant the performance benefits are. You are advised to enable this function.
omnidata.hive.filter.selectivity
0.2
If the actual filter selectivity is less than the value of this parameter, operator pushdown is performed. A smaller selectivity indicates less data remained after filtering. The default value is 0.2 and the type is double. The value ranges from 0.0 to 1.0.
omnidata.hive.table.size.threshold
102400
When the queried table size is larger than the value, operator pushdown is performed.
omnidata.hive.zookeeper.quorum.server
N/A
IP address for connecting to ZooKeeper.
IP address and port number for starting ZooKeeper in customer scenarios. Separate multiple IP addresses and port numbers with commas (,). The parameter is mandatory.
omnidata.hive.zookeeper.status.node
N/A
Path for storing pushdown information in ZooKeeper. The recommended value is /sdi/status. The parameter is mandatory.
omnidata.hive.zookeeper.conf.path
N/A
Path for storing the ZooKeeper configuration file. The parameter is mandatory.
omnidata.hive.group.optimized.enabled
false
Indicates whether to enable group optimization when operators are pushed down.
omnidata.hive.reduce.optimized.enabled
false
Indicates whether to enable reduce optimization when operators are pushed down.
omnidata.hive.group.optimized.coefficient
N/A
Sets the group optimization coefficient when operators are pushed down.
The values of omnidata.hive.filter.selectivity and omnidata.hive.table.size.threshold are recommended values based on Planning the Cluster Environment. The performance varies depending on the environment. In actual scenarios, if the query time required after operator pushdown is enabled is greater than 80% of the original query time, you can configure parameters omnidata.hive.filter.selectivity and omnidata.hive.table.size.threshold to optimize the pushdown performance.- Setting omnidata.hive.filter.selectivity (preferred):
Set this parameter to a smaller value (for example, 0.15 or 0.1) to reduce the negative effect brought by a high selectivity.
- Setting omnidata.hive.table.size.threshold:
Set this parameter to a larger value (for example, 204800) to reduce the negative effect when querying small tables.
- Setting omnidata.hive.filter.selectivity (preferred):
- Press Esc, type :wq!, and press Enter to save the settings and exit.
- Run the Hive command.
1/usr/local/hive/bin/hive -i /opt/hive/tpch_ndp.setting -database tpch_flat_orc_date_1000

- Run the sql6 statement.
select sum(l_extendedprice * l_discount) as revenue from tpch_flat_orc_1000.lineitem where l_shipdate >= '1993-01-01' and l_shipdate < '1994-01-01' and l_discount between 0.06 - 0.01 and 0.06 + 0.01 and l_quantity < 25;
When the task is executed, the pushdown information is printed as follows:
Table [lineitem] Push Down Info [ Select:[DoubleColMultiplyDoubleColumn(col 5:double, col 6:double) -> 17:double], Filter:[GenericUDFOPAnd(GenericUDFOPEqualOrGreaterThan(Column[l_shipdate], Const date 1993-01-01), GenericUDFOPLessThan(Column[l_shipdate], Const date 1994-01-01), GenericUDFBetween(Const boolean false, Column[l_discount], Const double 0.05, Const double 0.07), GenericUDFOPLessThan(Column[l_quantity], Const double 25.0))], Aggregation:[sum(_col0)], Group By:[], Limit:[], Raw Filter:[], Host Map:[hdfs1 -> hdfs1, hdfs2 -> hdfs2, hdfs3 -> hdfs3]
The output contains operator information.
