Principles
Computing path optimization includes record matching optimization, Single Instruction Multiple Data (SIMD)-based character set processing optimization, and unaligned memory access optimization.
Record Matching Optimization
For records that use the fixed row format, this optimization replaces field-by-field traversal matching with faster whole-record comparison using memcmp.
In the InnoDB storage engine, the original comparison logic relies on the cmp_dtuple_rec_with_match_low function to compare data structures dtuple_t (memory record format) against rec_t (page physical record format) field by field. Before the comparison, the rec_get_offsets function needs to be called to obtain the field offsets. After the optimization, memcmp is used to directly compare the entire record, which reduces the number of comparison operations and avoids the overhead of frequent rec_get_offsets calls.
In code, the main adjustments are as follows:
- In the row_search_mvcc function, cmp_dtuple_rec is replaced with memcmp.
- In the page_cur_search_with_match function, rec_get_offsets and cmp_dtuple_rec_with_match calls are replaced with the new byte-level comparison method rec_direct_memcmp.
To use memcmp for record comparison, the following conditions must be met:
- All fields can be compared using memcmp (for details, see the dtype_is_memcmp_deterministic function).
- All fields are of fixed length and are defined as non-null.
- All fields are sorted in ascending order.
In addition, InnoDB only caches the record offsets of internal tables. This optimization extends the offset caching mechanism to all indexes that contain fixed-length fields. By reusing the cached offsets, this optimization reduces redundant calculations, thereby improving the overall performance of B-tree search.
SIMD-based Character Set Processing Optimization
This optimization uses SIMD instructions to accelerate the vectorization of utf8/utf8mb4 character set processing, improving the efficiency of related operations.
As utf8/utf8mb4 is a variable-length encoding, the length of a character needs to be calculated or the character needs to be converted to a fixed-length Unicode format (2 bytes) before subsequent processing. ASCII characters (ranging from 0 to 127) are single-byte fixed-length, and their Unicode representation remains the same as the original value (with higher bytes padded with zeros). Therefore, SIMD-based parallel processing can be implemented for these characters to improve the processing throughput.
In code, the following functions are optimized:
- In my_collation_utf8_general_ci_handler, my_strnxfrm_unicode, my_hash_sort_utf8, and my_hash_sort_utf8mb4 are optimized.
- In my_charset_utf8_handler, my_numchars_mb and my_charpos_mb are optimized.
Unaligned Memory Access Optimization
Early Arm architecture (such as ARMv5) does not support unaligned memory access. As such, MySQL reads bytes one by one and performs shift and accumulation operations to convert pointers to integers. The x86 architecture supports unaligned memory access. This allows for direct type conversion of pointers. Kunpeng 920 processors support unaligned memory access. Therefore, the corresponding optimization policies on the x86 architecture can be ported to the Arm architecture to directly convert pointers to the integer type, improving the type conversion efficiency.