Executing Hive Services

Hive uses the interactive CLI to execute SQL tasks. To check whether HiveExtension takes effect on Hive, add the EXPLAIN statement before the SQL statement. If the operator name starts with "Omni", HiveExtension has taken effect.

In this example, the tpcds_bin_partitioned_varchar_orc_2 data table is used as the test table and Q82 of the TPC-DS test dataset is used as the test SQL statement.

Table 1 lists the related table information.

**Table 1** Table information
Table	Format	Total Number of Rows
item	orc	26000
inventory	orc	16966305
date_dim	orc	73049
store_sales	orc	5760749

Start the Hive SQL CLI.

Commands for starting open source Hive SQL:

hive --database tpcds_bin_partitioned_varchar_orc_2 --hiveconf hive.fetch.task.conversion=none --hiveconf hive.cbo.enable=true --hiveconf hive.exec.reducers.max=600 --hiveconf hive.exec.compress.intermediate=true --hiveconf hive.tez.container.size=8192 --hiveconf tez.am.resource.memory.mb=8192 --hiveconf tez.task.resource.memory.mb=8192 --hiveconf tez.runtime.io.sort.mb=128 --hiveconf hive.merge.tezfiles=true --hiveconf tez.am.container.reuse.enabled=true

Run the following command to start HiveExtension:

hive --database tpcds_bin_partitioned_varchar_orc_2 --hiveconf hive.fetch.task.conversion=none --hiveconf hive.cbo.enable=true --hiveconf hive.exec.reducers.max=600 --hiveconf hive.exec.compress.intermediate=true --hiveconf hive.tez.container.size=8192 --hiveconf tez.am.resource.memory.mb=8192 --hiveconf tez.task.resource.memory.mb=8192 --hiveconf tez.runtime.io.sort.mb=128 --hiveconf hive.merge.tezfiles=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.exec.pre.hooks=com.huawei.boostkit.hive.OmniExecuteWithHookContext --hiveconf tez.task.launch.env=OMNI_CONF=/opt/omni-operator/hive,LD_LIBRARY_PATH=/opt/omni-operator/lib

In the preceding startup command, the OMNI_CONF path must be the same as the custom configuration file directory in Setting the Hive Configuration File.
If the dataset contains data of the String type and each single field of the data contains a large amount of data (more than 512 characters), you are advised to add the following configuration:
```
--hiveconf omni.hive.string.length=2000
```

If a large Parquet dataset (for example, 10 TB) is used, you are advised to increase the value of hive.tez.container.size and other related parameters of the Tez task. The recommended startup command is as follows:

hive --database tpcds_bin_partitioned_varchar_parquet_10000 --hiveconf tez.task.launch.env=OMNI_CONF=/opt/omni-operator/hive,LD_LIBRARY_PATH=/opt/omni-operator/lib --hiveconf hive.vectorized.execution.enabled=true --hiveconf hive.cbo.enable=true --hiveconf hive.exec.reducers.max=600 --hiveconf hive.exec.compress.intermediate=true --hiveconf hive.tez.container.size=61440 --hiveconf tez.am.resource.memory.mb=61440 --hiveconf tez.task.resource.memory.mb=61440 --hiveconf mapreduce.reduce.java.opts=-Xmx49152m --hiveconf mapreduce.map.java.opts=-Xmx49152m --hiveconf tez.runtime.io.sort.mb=128 --hiveconf hive.merge.tezfiles=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.exec.pre.hooks=com.huawei.boostkit.hive.OmniExecuteWithHookContext --hiveconf tez.container.max.java.heap.fraction=0.5

The preceding startup command is used in Yarn mode. If you want to use the local mode instead, add the following parameters to the end of the startup command:
```
--hiveconf tez.local.mode=true --hiveconf tez.runtime.optimize.local.fetch=true
```

Table 2 describes the HiveExtension startup parameters.

**Table 2** HiveExtension startup parameters
Parameter	Default Value	Description
hive.exec.pre.hooks	com.huawei.boostkit.hive.OmniExecuteWithHookContext	Starts HiveExtension.
tez.task.launch.env	OMNI_CONF=/opt/omni-operator/hive,LD_LIBRARY_PATH=/opt/omni-operator/lib	Sets the environment of the Tez task process.
hive.fetch.task.conversion	more	Indicates whether to convert Hive data queries into MapReduce tasks. more: MapReduce is executed for part of tasks; none: MapReduce is executed for all tasks.
hive.cbo.enable	true	Indicates whether to enable the cost-based optimizer. true: yes; false: no.
hive.exec.reducers.max	1009	Sets the maximum number of reducers.
hive.exec.compress.intermediate	false	Indicates whether to compress intermediate files generated by Hive between MapReduce tasks. true: yes; false: no.
hive.tez.container.size	-1	Sets the default size of a container generated by a Tez task, in MB.
tez.am.resource.memory.mb	1024	Sets the memory size used by the ApplicationMaster corresponding to each Tez task in the cluster, in MB.
tez.task.resource.memory.mb	1024	Sets the memory size used by started tasks in a Tez container, in MB.
tez.runtime.io.sort.mb	100	Sets the size of the output sorting memory, in MB.
hive.merge.tezfiles	false	Indicates whether to merge the files generated by a Tez task. true: yes; false: no.
tez.am.container.reuse.enabled	true	Indicates whether to reuse containers. true: yes; false: no.
tez.local.mode	false	Indicates whether to enable the local mode. true: yes; false: no.
tez.runtime.optimize.local.fetch	true	Indicates whether to optimize the file logic in local mode. true: yes; false: no.
hive.vectorized.execution.enabled	true	Indicates whether to enable Hive vectorization. true: yes; false: no.
mapreduce.map.java.opts	-Xmx200m	Sets the maximum heap memory that can be used by the Map task.
mapreduce.reduce.java.opts	-Xmx200m	Sets the maximum heap memory that can be used by the Reduce task.
tez.container.max.java.heap.fraction	0.8	Sets the ratio of the heap memory available to the Tez main task process to the container memory.
omni.hive.string.length	-	Sets the maximum String data length supported by HiveExtension.
omni.hive.adaptivePartialAggregation.enabled	false	Enables adaptive skipping of the HashAgg group aggregation operation in the partial stage. This optimization is performed during running. The partial stage of group aggregation is skipped and data is directly output to the downstream operator if the sampling scenario is identified as a high cardinality scenario and group aggregation is performed.
omni.hive.adaptivePartialAggregationMinRows	500000	Specifies the minimum number of rows sampled for adaptivePartialAggregation optimization. When this number has been reached, the tool calculates the aggregation of the sampled data.
omni.hive.adaptivePartialAggregationRatio	0.8	Specifies the minimum aggregation threshold for adaptivePartialAggregation optimization. If the aggregation of sampled data has reached the threshold, this type of optimization is applied.

Check whether HiveExtension takes effect.

Run the following SQL statement in the HiveExtension CLI and open source Hive SQL CLI:

explain select  i_item_id
    ,i_item_desc
    ,i_current_price
from item, inventory, date_dim, store_sales
where i_current_price between 76 and 76+30
and inv_item_sk = i_item_sk
and d_date_sk=inv_date_sk
and d_date between cast('1998-06-29' as date) and cast('1998-08-29' as date)
and i_manufact_id in (512,409,677,16)
and inv_quantity_on_hand between 100 and 500
and ss_item_sk = i_item_sk
group by i_item_id,i_item_desc,i_current_price
order by i_item_id
limit 100;

The following figure shows the execution plan output in the HiveExtension CLI. If the operator name starts with "Omni", HiveExtension has taken effect.

Execution plan outputted by open source Hive SQL

Run the following SQL statement.

Run the following SQL statement in the HiveExtension CLI and open source Hive SQL CLI:

 select  i_item_id
    ,i_item_desc
    ,i_current_price
from item, inventory, date_dim, store_sales
where i_current_price between 76 and 76+30
and inv_item_sk = i_item_sk
and d_date_sk=inv_date_sk
and d_date between cast('1998-06-29' as date) and cast('1998-08-29' as date)
and i_manufact_id in (512,409,677,16)
and inv_quantity_on_hand between 100 and 500
and ss_item_sk = i_item_sk
group by i_item_id,i_item_desc,i_current_price
order by i_item_id
limit 100;

Compare the results.
- Execution result of open source Hive SQL:
  
  The execution plan is as follows:
- HiveExtension execution result
  
  The execution plan is as follows:
- Result comparison: The SQL output data of open source Hive SQL is the same as that of HiveExtension.

Parent topic: Using on Hive