Principles

The binlog optimization is achieved through pre-allocation, lock splitting, and writeset_history data structure optimization to improve system performance.

Binlog Pre-Allocation

During transaction group commit, the leader thread calls the write function in the FLUSH phase to commit the binlog and the fdatasync function in the SYNC phase to forcibly flush the binlog to drives. Dynamic growth of binlog files will bring extra metadata overhead (for example, updating binlog file metadata). To address this, the size of binlog files when being created is pre-allocated as max_binlog_size. This prevents metadata operations caused by dynamic file size growth during writes, thereby reducing I/O overhead and improving system performance.

Binlog Lock Splitting

During transaction group commit, all followers in the FLUSH, SYNC, and COMMIT phases share the same lock and condition variable (m_lock_done and m_cond_done). When the leader thread in the COMMIT phase successfully commits transactions, it calls pthread_cond_broadcast to wake up all followers in the FLUSH/SYNC/COMMIT phase. Then, followers in the FLUSH and SYNC phases call pthread_cond_wait to enter the waiting state. Excessive false wakeups increase the system overhead associated with pthread_cond_wait/pthread_cond_broadcast calls and intensify lock contention. To solve this problem, locks are split so that phase-specific followers wait for different locks, reducing the possibility of false wakeups between different groups.

Binlog writeset_history Data Structure Optimization

During transaction group commit, the leader thread in the FLUSH phase calls Writeset_trx_dependency_tracker::get_dependency to obtain transaction dependencies. The sequence_number information of transactions is stored in the m_writeset_history variable, which uses the std::map data structure. std::map stores elements in a red-black tree, and its insertion and search time complexity is O(log N). This data structure can be replaced with the hash map, reducing the insertion and search time complexity to O(1) for higher efficiency.

Parent topic: Feature Description