Tool Overview
pt-table-checksum
pt-table-checksum a tool provided in percona-toolkit. It is used to verify primary/secondary data consistency online.
pt-table-checksum is used for:
- Data consistency check before and after data migration
- Primary/secondary data consistency check after primary/secondary replication faults are rectified
- Locating tables that generating dirty data (If the secondary database is not read-only, dirty data is generated when the client incorrectly considers the secondary database as the primary database and writes data into the secondary database.)
The principles of pt-table-checksum are described as follows:
pt-table-checksum runs the check statement on the primary database to check the primary/secondary data consistency of MySQL replication online, generates the replace statement, transfers the statement to secondary databases, and updates the value of master_src. By default, pt-table-checksum automatically discovers and connects to all secondary databases, and checks the values of this_src and master_src to determine whether the data is consistent between primary and secondary databases. pt-table-checksum calculates the following two types of checksums:
- Checksum of a single row of data
pt-table-checksum checks the table structure, obtains the data type of each column in the table, converts all data types into character strings, uses the concat_ws() function for connection, and then uses CRC32 to calculate the checksum value of the row.
- Checksums of chunks
The efficiency will be low if the checksums are calculated row by row and then compared with those on secondary databases. pt-table-checksum can split table data into chunks by using the indexes in the table and calculates checksums based on chunks. The aggregation function BIT_XOR() is introduced in pt-table-checksum. The function combines all rows of data in a chunk and calculates the CRC32 value to obtain the checksum of the chunk. When the tool calculates the checksum of a chunk on the primary database, the primary database may still be updated. To ensure that the same checksum is calculated, you need to add a FOR UPDATE lock to the chunk.