Difference Between Processors

In addition to architecture difference, pay attention to processor difference during lock porting. In the x86 architecture, the cache line of the L3 cache of most processors is 64 bytes. However, the cache line of the L3 cache of the Kunpeng 920 processor is 128 bytes. Therefore, when designing the data structure of the lock, avoid false sharing (variables that are independently modified by multiple threads share the same cache line). The false sharing has a great impact on performance.

The following uses a Linux kernel optimization case as an example. It is a patch in the iommu driver of the Linux kernel for performance improvement on the ARM platform (see https://gitlab.freedesktop.org/drm/msm/commit/14bd9a607f9082e7b5690c27e69072f2aeae0de4). The code before optimization is as follows:

struct iova_domain {
/*............ */
    struct iova anchor; 
    struct iova_rcache rcaches[IOVA_RANGE_CACHE_MAX_SIZE];  
    iova_flush_cb   flush_cb;   
    iova_entry_dtor entry_dtor; 
    /* Number of TLB flushes that have been started */
    atomic64_t  fq_flush_start_cnt;
    /* Number of TLB flushes that have been finished */
    atomic64_t  fq_flush_finish_cnt;   
    struct timer_list fq_timer;
    /* 1 when timer is active, 0 when not */
    atomic_t fq_timer_on; 
};

In this data structure, the variables accessed by multiple threads are fq_flush_start_cnt, fq_flush_finish_cnt, and atomic_t fq_timer_on of the atomic type. To prevent false sharing, these variables should not be distributed on the same cache line. When the cache line is 64 bytes, insert a fq_timer variable between fq_flush_finish_cnt and fq_timer_on to achieve this purpose.

However, when the cache line of the processor is 128 bytes, more variables need to be inserted between atomic variables to make fq_flush_finish_cnt and fq_timer_on to be distributed on different cache lines. The code is as follows:

struct iova_domain {
...
    /* Number of TLB flushes that have been started */
    atomic64_t  fq_flush_start_cnt;
    /* Number of TLB flushes that have been finished */
    atomic64_t  fq_flush_finish_cnt;   
    struct iova anchor; 
    struct iova_rcache rcaches[IOVA_RANGE_CACHE_MAX_SIZE];  
    iova_flush_cb   flush_cb;   
    iova_entry_dtor entry_dtor; 
    struct timer_list fq_timer;
    /* 1 when timer is active, 0 when not */
    atomic_t fq_timer_on; 
};

Parent topic: Lock Porting