Hive Overview
Hive is a Hadoop-based data warehouse tool. It maps structured data files to tables and provides SQL-like query capabilities.
Figure 1 Hive architecture
The process is as follows:
- Submit a task such as a query to the driver.
- The compiler obtains the task plan of the user.
- The compiler obtains required Hive metadata from MetaStore based on the user task.
- The compiler obtains the metadata information and compiles the task. It first converts HiveQL into an abstract syntax tree, the abstract syntax tree into a query block, and the query block into a logical query plan. Then, it rewrites the logical query plan and converts the logical plan into a physical plan (on the Tez engine). Finally, the compiler selects the optimal policy.
- The compiler submits the final plan to the driver.
- The driver transfers the plan to the execution engine. The execution engine obtains the metadata information and submits it to the job tracker or source manager to execute the task. The task directly reads files in HDFS and performs corresponding operations.
- Obtain the execution results.
- Obtain and return the execution results.
- According to the preceding process, Hive execution is affected in the following two aspects:
- Task compilation by the compiler. This directly affects the query plan. Different query plans affect the actual physical plan (on the Tez engine).
- Tez engine. This is the main body for executing Hive tasks.
- For more information about Hive, visit the official Hive website.
Parent topic: OmniData on the Hive Engine