Constraints

Application Scenarios

Index tables do not support disaster recovery.
Rolling upgrade is not supported for index data.
Do not perform the DISABLE, DROP, MODIFY, or TRUNCATE operation on index tables or index metadata tables.
Data definition language (DDL) operations allowed on indexes include modifying index status, deleting indexes, and creating indexes, but not modifying index definitions. If you need to modify an index definition, delete it and create an index again.
Ensure that the system time of each HBase node is synchronized.
When a client is writing data, the data may not be synchronized in real time between the data table and index table. However, after a write success is returned to the client, the data is visible in both the data table and index table.

Creating indexes

An index name must be a regular expression and does not support other characters. Regular expressions support the characters of [a-zA-Z_0-9-.].
The associated data table must exist. The name of the index table to be created must be unique.
The index table does not support multiple versions. Indexes cannot be created on data tables with multiple versions (VERSION > 1), and the version of the index table is 1.
Do not create too many indexes for a data table. A large number of indexes will increase the storage cost and prolong the write time. Therefore, it is recommended that the number of indexes in a data table be less than or equal to 5. If more than five indexes are required, add the hbase.gsi.max.index.count.per.table parameter, set it to a value greater than 5, and restart HBase for the parameter setting to take effect.

Indexes cannot be created for index tables. Multiple indexes cannot be created in nested mode. Index tables are used only to accelerate query and do not provide data table functions.
Dot not create an index that can be covered by an existing one.
When you create an index, if it can be covered by an existing index (that is, it is a pre-order subset of an existing index), the index cannot be created. That is because indexes with duplicate functions cause storage waste. For example, in the following operation, index 3 can be created but index 2 cannot.

Create a data table: create 't1','cf1'

Create index 1: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx1=>cf1:[q1],[q2]'

Create index 2: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx2=>cf1:[q1]'

Create index 3: hbase com.huawei.boostkit.hindex.mapreduce.GlobalTableIndexer -Dtablename.to.index='t1' -Dindexspecs.to.add='idx3=>cf1:[q2]'
Each index must have a unique name across all data tables.
When creating an index, ensure that the region server node is stable and do not bring the node offline.
If an index table associated with a data table is being created (in the BUILDING state), writing data to the data table will fail. Therefore, do not write data to a data table during index creation.

Writing index data

Only the Put/Delete API can be used to generate index data. When data is written to a data table in other methods (such as Increment, Append, and Bulkload), no index will be generated.
When the index column data is of the String type, do not write the special invisible characters \x00 and \x01.
Timestamps cannot be specified when writing data to covering or index columns in the primary table.
Data timestamps of the index table do not inherit those of the primary table.

Querying indexes

When an index is used to accelerate the query, the index must be in the ACTIVE state.
When an index table is used to accelerate the data table query, only the scan query operation of SingleColumnValueFilter is supported, and the specified filter column must include an index column.
Only one index is hit in a query. Multi-index joint query is not supported.

If the query result hits a secondary index, the sequence of the query result is different from that of hitting the primary table directly.
When a secondary index is hit, SingleColumnValueFilter has the following restrictions: When any column is filtered, latestVersionOnly is true and cannot be changed. When an index column is filtered, filterIfMissing is true by default and cannot be changed.
When an index table is used to accelerate data table query, the caching, limit, and cacheblocks attributes can be set for a scan operation. If the reversed, raw, startRow, and stopRow attributes are set, the index table will not be used to accelerate data table query. Other attributes do not take effect.

Parent topic: Feature Overview