Rate This Document
Findability
Accuracy
Completeness
Readability

Data Layout Optimization

Principles

In various memory layers, the cache speed is fast, and the cache capacity is relatively small. When the hit rate is high, the data access speed of the processor can be improved. Data layout optimization adjusts the data layout in the memory to increase the cache hit ratio, TLB hit ratio, and cacheline utilization rate. This helps to improve the instruction execution efficiency and reduce time cost.

Common optimization methods include structure layout optimization, data rearrangement, and false sharing optimization (see Optimizing Cacheline).

Modification Method

  • The following describes the data organization modes of the structure array and array structure to optimize the data layout.
    • The structure array is defined as follows:
      //Structure array
      struct Array{
              int x;
              int y; 
      };
      struct Array stArray[N];

      In this case, each group of x and y are stored consecutively. The storage format in the memory is as follows:

    • The array structure is defined as follows:
      //Array structure
      struct Array{
              int x[N];
              int y[N]; 
      };
      struct Array stArray;

      In this case, x and y are stored separately. The storage format in the memory is as follows:

    When operations majorly involve x in a service scenario, the data loaded to the memory and cache by x in the array structure mode is continuous. Compared with the structure array, the cacheline validity and cache hit ratio are improved.

  • Data rearrangement

    The two-dimensional array is used as an example. When array B is added to array A by column, column elements are inconsecutive in the cache and are not in the same cacheline. As a result, the performance is relatively poor. By rearranging array B and adding array B to array A by row, data can be continuously read from the cache. Before the rearrangement, the time consumed is 544,939 μs. After the rearrangement, the time consumed is 235,722 μs.

    Read by column:

    for (int i = 0; i < ARRAYLEN; i++) {
        for (int j = 0; j < ARRAYLEN; j++) {
            arrayC[i][j] = arrayA[i][j] + arrayB[j][i];
        }
    }

    Read by row:

    for (int i = 0; i < ARRAYLEN; i++) {
        for (int j = 0; j < ARRAYLEN; j++) {
            array[i][j] = arrayA[i][j] + arrayB[i][j];
        }
    }