Rate This Document
Findability
Accuracy
Completeness
Readability

Code Implementation

  • Changes in latch guards:

    The following table lists the classes added by the Lock-sys optimization feature. When Exclusive_global_latch_guard is used only for rare events such as deadlock parsing, Shard_latch_guard, Shard_latches_guard, and Exclusive_global_latch_guard can handle most cases. However, there are special cases in our code that require other finer-grained tools.

    Class

    Description

    Shard_latch_guard

    S-latches the global latch and latches the shard mutex for its lifetime.

    Shard_latches_guard

    S-latches the global latch and latches two shard mutexes in correct order for its lifetime.

    Exclusive_global_latch_guard

    X-latches the global latch for its lifetime.

    Shared_global_latch_guard

    S-latches only the global latch, but not any of shards.

    NOTE:

    This will allow the class to latch shards individually when iterating over a list of transaction locks in an efficient way during the transaction lock release phase.

    Naked_shard_latch_guard

    Latches only a shard, but not the global latch.

    NOTE:

    This class is used in combination with Shared_global_latch_guard.

    Try_exclusive_global_latch

    This class tries to x-latch the global latch for its lifetime.

    NOTE:

    This class is the only use case in the srv_printf_innodb_monitor() function, and the class tries to avoid interfering with workload when the function reports InnoDB monitor output.

    Unsafe_global_latch_manipulator

    Manually latches and unlatches the exclusive global latch on demand manually in a non-structured way.

    NOTE:

    This class needs to be used in the following code implementation path:

    srv_printf_innodb_monitor() =>

    srv_printf_locks_and_transactions() =>

    lock_print_info_all_transactions() =>

    lock_trx_print_locks() =>

    lock_rec_fetch_page()

  • Changes in trx->mutex:

    Interaction between Lock-sys and trx->mutex-es is rather complicated. In particular, we allow a thread performing Lock-sys operations to request another trx->mutex even though it already holds one for a different trx. Therefore, one has to prove that it is impossible to form a deadlock cycle in the imaginary wait-for-graph in which edges go from thread trying to obtain trx->mutex to a thread which holds it at the moment.

    In the past it was simple, because Lock-sys was protected by a global mutex, which meant that there was at most one thread which could try to posses more than one trx->mutex – one cannot form a cycle in a graph in which only one node has both incoming and outgoing edges.

    As long as multiple threads happen in different shards, these threads can perform Lock-sys operations in parallel. Therefore, Lock-sys mutex can be split.

  • Changes in dict_table_t::autoinc_trx:

    This field is used to store pointer to trx which currently holds an AUTO_INC lock. There can be at most one such transaction, as these locks are mutually exclusive. This field is "peeked at" by row_lock_table_autoinc_for_mysql() to see if the current transaction already has an AUTO_INC granted, in which case it can follow a fast path. Such checks are done without any latch. This requires changing the type to atomic<>, but is otherwise correct, as the result of such comparison is guaranteed to be meaningful, given that we only change the value of this field during granting and releasing, and releasing cannot happen concurrently with the thread running the transaction. However, some comments around this code, assertions, and the type of the field have to be modified, to better explain why and how all this works.

  • Changes in dict_table_t::n_rec_locks:

    This field counts how many record locks (granted or waiting) are currently associated with the given table. As such locks can be created and destroyed in parallel now as long as they are in different shards, this field needs to become atomic<>, and transactions should acquire an exclusive global latch to read the value of this field. Otherwise, the value can get modified before we act on the result. Here again we need to update some comments.

  • Changes in hit list:

    For high priority transactions, we have a mechanism which tries to abort conflicting transactions of lower-priority to avoid waiting on record locks. The procedure of identifying and aborting conflicting transactions will require the exclusive global latch, because (among other reasons) it may require accessing an unrelated Lock-sys queue in which the blocking transactions are waiting themselves. To avoid taking this exclusive global latch in the probable case of our high priority transaction not being in the waiting state at all, we need a reliable, thread-safe way of checking if the transaction is waiting. It is TBD if this can be simply done using "hp_trx->lock.que_state == TRX_QUE_LOCK_WAIT".

    Some places in code seem to modify this field without any latch, which looks unsafe. Alternative, a correct approach, would be to use "hp_trx->lock.blocking_trx.load() != nullptr" instead, which requires only a minor change in comments, to clarify that it is important to clear this field when wait finishes (it has been done in trunk, but comments leave some doubt). Fixing que_state is out of scope of this feature.

  • Changes in lock_release():

    Releasing all locks of the transaction requires iterating over its locks and for each of them performing some actions in respective lock queue. Simple, but inefficient, way of doing it is to acquire an exclusive global latch.

    It is much better for parallelism, to instead acquire just a shared global latch, and then latch one by one only the shards containing particular locks as we iterate. The difficulty with this approach is that latching order rules requires acquiring Lock-sys latches before trx latches, and trx->mutex protects the list of transaction locks. In particular, it protects it from concurrent B-tree modifications causing relocation of locks between pages (and thus shards). So, there is a chicken-and-egg problem: We want to know which shard to latch, for which we need to know what is the next lock. However, to iterate over the list, we need trx->latch, which we can only take after latching the shard. To overcome this, we will first latch trx->mutex, note down the shard ID of the last lock in the list, release trx->mutex, latch the particular shard, and latch trx->mutex again. Make sure that the lock is still at the tail of the list, and only then proceed. This might seem complicated, but is actually much faster than the "stop the world" approach.

  • Changes in lock_trx_release_read_locks():

    The lock_trx_release_read_locks() function is mostly used in group replication appliers to release read gap locks. In testing, it turned out to be a bottleneck if the exclusive global latch is used to iterate over transaction locks. Similar to the lock_release() function, we should instead acquire a shared global latch, and latch shards one by one when we iterate. The problem is that other threads can modify the lock list concurrently (for example, because of implicit-to-explicit conversion, or B-tree reorganization), and we cannot simply compare the current lock with the tail because we are not removing all locks, but a subset of them. Therefore, the trick with operating only at the tail of the list is insufficient. To notice such situations (and restart iteration), we will introduce "uint64_t trx->lock.trx_locks_version", which is incremented each time a lock is added to or removed from the trx lock list. After several failed restarts, we can switch back to the old lock_trx_release_read_locks_in_x_mode().

  • Other changes:
    • Separate the whole latching logic to the dedicated class locksys::Latches and document extensively the design in its header.
    • All new functions will be in the locksys namespace.
    • All usages of lock_mutex_enter()/lock_mutex_exit() will be replaced with appropriate latch guards, preferably locksys::Shard_latch_guard.
    • table->n_rec_locks must become atomic, because it can now be incremented or decremented in parallel during creation or destruction of record locks for given tables.
    • dict/mem.cc does not really need to include lock0lock.h when compiled for UNIV_LIBRARY.
    • Remove lock_mutex from PSI.
    • Add lock_sys_global_latch_rw_lock to PSI.
    • Add lock_sys_page_mutex to PSI.
    • Add lock_sys_table_mutex to PSI.
    • All places where we use the exclusive global latch will be documented to specify the remaining reasons we have to resort to such strong synchronization.
    • The table->autoinc_trx field should be atomic as it is "peeked" without any latch, and confusing or wrong comments and asserts around it have to be cleaned up, to clarify why it is correct.
    • lock_rec_expl_exist_on_page() should return a "bool" instead of a potentially dangling pointer to a "lock_t".
    • lock_print_info_summary and the logic inside srv_printf_innodb_monitor() in general need at least some small refactoring so that the latch guards can be used.
    • The lock_mutex_own() debug predicate would have to be replaced with more specific owns_exclusive_global_latch(), owns_shared_global_latch(), owners_page_shard(page), owners_page_shard(table), ...
    • bool Sharded_rw_lock::try_x_lock needs to be implemented.
    • The control flow of lock_rec_insert_check_and_lock() (and its copy&paste lock_prdt_insert_check_and_lock) can be simplified by removing code duplication, before we can use latch guards.
    • The code around lock_rec_queue_validate() could be simplified by removing code duplication, and using more structured latching.
    • Update sync0debug so it has proper rules for latch order.