Rate This Document
Findability
Accuracy
Completeness
Readability

Tool Overview

pt-table-checksum

pt-table-checksum a tool provided in percona-toolkit. It is used to verify primary/secondary data consistency online.

pt-table-checksum is used for:

  • Data consistency check before and after data migration
  • Primary/secondary data consistency check after primary/secondary replication faults are rectified
  • Locating tables that generating dirty data (If the secondary database is not read-only, dirty data is generated when the client incorrectly considers the secondary database as the primary database and writes data into the secondary database.)

The principles of pt-table-checksum are described as follows:

pt-table-checksum runs the check statement on the primary database to check the primary/secondary data consistency of MySQL replication online, generates the replace statement, transfers the statement to secondary databases, and updates the value of master_src. By default, pt-table-checksum automatically discovers and connects to all secondary databases, and checks the values of this_src and master_src to determine whether the data is consistent between primary and secondary databases. pt-table-checksum calculates the following two types of checksums:

  • Checksum of a single row of data

    pt-table-checksum checks the table structure, obtains the data type of each column in the table, converts all data types into character strings, uses the concat_ws() function for connection, and then uses CRC32 to calculate the checksum value of the row.

  • Checksums of chunks

    The efficiency will be low if the checksums are calculated row by row and then compared with those on secondary databases. pt-table-checksum can split table data into chunks by using the indexes in the table and calculates checksums based on chunks. The aggregation function BIT_XOR() is introduced in pt-table-checksum. The function combines all rows of data in a chunk and calculates the CRC32 value to obtain the checksum of the chunk. When the tool calculates the checksum of a chunk on the primary database, the primary database may still be updated. To ensure that the same checksum is calculated, you need to add a FOR UPDATE lock to the chunk.