Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction to HBase

HBase (short for Hadoop Database) is a distributed storage system that is column-based and scalable with high reliability and high performance. With the HBase technology, a large-scale storage cluster can be set up on low-cost PC servers.

HBase consists of three components: HMaster, HRegionServer, and ZooKeeper. The three components have the following responsibilities:

HMaster

HMaster is the controller of the entire HBase and has the following responsibilities:

  • Perform load balancing.
  • Manage permissions (using ACL).
  • Recycle junk files in the HDFS.
  • Manage metadata of namespaces and tables.
  • Create, delete, and update tables (updating column families).
  • Allocate regions: Allocate regions upon startup. Reallocate regions on the invalid RegionServer. Allocate regions during division.

HRegionServer

HRegionServer is the actual HBase reader/writer and has the following responsibilities:

  • Divide regions.
  • Interact with HDFS and manage table data.
  • Respond to read and write requests from clients and perform I/O operations.

ZooKeeper

ZooKeeper is the actual HBase coordinator and has the following responsibilities:

  • Store table metadata in HBase.
  • Ensure that only one HMaster in the cluster is in active state.
  • Store hbase:meta, that is, location information of all regions.
  • Monitor the RegionServer status and report the RS online and offline status to HMaster.
  • The Zookeeper cluster uses the Paxos protocol to ensure the node status consistency.