Suggestions for Vectorizing Source Code
Vectorization Suggestion |
Cause of Failure to Vectorize |
How to Modify |
Example |
|---|---|---|---|
Extracting loop control variables |
The loop control variable of the for loop is a structure member. The compiler cannot determine the loop end condition. As a result, the loop cannot be automatically vectorized. |
Extract the loop control variable out of the loop. |
Example code: for (i = 0; i < data->len; i++) {
vecC[i] = vecA[i] + vecB[i];
}
Modify it as follows: int len = data->len; // Extract the loop control variable out of the loop.
for (i = 0; i < len; i++) {
vecC[i] = vecA[i] + vecB[i];
|
Modifying the loop control condition |
Clang 15 supports automatic vectorization while earlier versions do not. |
Change the loop condition from <= to <, and the loop length from len to len+1. |
Example code: for (i = 0; i <= data->len; i++) {
vecC[i] = vecA[i] + vecB[i];
}
Modify it as follows: // Change the loop condition from <= to <, and the loop length from len to len+1.
for (i = 0; i < data->len + 1; i++) {
vecC[i] = vecA[i] + vecB[i];
}
|
Adding a compilation instruction for automatic vectorization |
After evaluating the benefits of vectorization, the compiler adopts a conservative policy and determines not to perform automatic vectorization. |
Add the pragma compilation instruction to force automatic code vectorization. |
Example code: for (i = 0; i < data->len; i++) {
vecC[i] = vecA[i] + vecB[i];
}
Modify it as follows: // Add the pragma compilation instruction to force automatic code vectorization.
#pragma clang loop vectorize(enable)
for (i = 0; i < data->len; i++) {
vecC[i] = vecA[i] + vecB[i];
}
|
Specifying that the memory to which the pointer points is not referenced by other pointers |
It cannot be determined whether the memory to which the pointer points is referenced by any other pointers. The compiler will abandon automatic vectorization. |
Add the restrict keyword to label the pointer variable. |
Example code: void func(int *A, struct Data *data)
{
data->a = A[0];
data->b = A[1];
}
Modify it as follows: // Add restrict to the argument <A>.
void func(int *restrict A, struct Data *data)
{
data->a = A[0];
data->b = A[1];
}
|
Keeping the consistent data type and length |
The variable type does not match. The compiler cannot perform automatic vectorization. |
Change the variable type from long to int. |
Example code: void func(int *vec) {
long b = 1;
int i;
for (i = 0; i < 64; i++) {
vec[i] = (b << i);
}
}
Modify it as follows: void func(int *vec) {
// Change the variable type from long to int.
int b = 1;
int i;
for (i = 0; i < 64; i++) {
vec[i] = (b << i);
}
}
|
Splitting the loop |
The l-value space of the loop operation is fixed and the loop dependency exists. Therefore, the compiler cannot perform vectorization. |
Split the loop. Store the l-value of each round of loop operation independently and then merge all the left values. |
Example code: for( int i = 0; i < 4; i++ ) {
......
sum += a0 + a1 + a2 + a3;
}
Modify it as follows: // Declare an array,
// and assign element in each iteration,
// and finally accumulate elements in the array.
uint32_t sumTmp[4];
for( int i = 0; i < 4; i++ ) {
......
sumTmp[i] = a0 + a1 + a2 + a3;
}
sum = sumTmp[0] + sumTmp[1] + sumTmp[2] + sumTmp[3];
|
Simplifying the code logic in the conditional branch |
Complex operations exist in conditional branch statements. As a result, automatic vectorization is impossible. |
Extract the operation statements out of the conditional branch. |
Example code: for( int i = 0; i < len; i++ ) {
if (flag[i])
vecC[i] = vecA[i] + vecB[i];
else
vecC[i] = vecA[i] - vecB[i];
sum += vecC[i];
}
Modify it as follows: for( int i = 0; i < len; i++ ) {
// Extract all expressions outside the branch.
int ifTrue = vecA[i] + vecB[i];
int ifFalse = vecA[i] - vecB[i];
if (flag[i])
vecC[i] = ifTrue;
else
vecC[i] = ifFalse;
sum += vecC[i];
}
|
Changing the data type to unsigned |
The data types are inconsistent and the compiler cannot perform vectorization. |
Change the data type from signed to unsigned. |
Example code: int sum;
for( int i = 0; i < len; i++ ){
b0 = abs2(a0 + a4) + abs2(a0 - a4);
sum += (uint16_t)b0;
}
return sum;
Modify it as follows: "// Change the type of <sum> from signed to unsigned.
unsigned int sum;
for( int i = 0; i < len; i++ ){
b0 = abs2(a0 + a4) + abs2(a0 - a4);
sum += (uint16_t)b0;
}
return sum;
|
Reducing the calculation precision |
The calculation precision requirement is high. To ensure the calculation precision, the compiler does not perform automatic vectorization. |
Reduce the calculation precision. |
Example code: DO K = 1,KM
veC(k)= (vecA(K) + vecB(K + 1))*0.5D0
END DO
Modify it as follows: DO K = 1,KM
veC(k)= (vecA(K) + vecB(K + 1))*0.5
END DO
|
Splitting the loop |
The loo has many statements. The compiler cannot determine the variable dependency and does not perform vectorization. |
Split the statements of the loop and add them to multiple loops. |
Example code: DO A = 1,AM
DO K = 1,KM
DO J = 3,JMT
DO I = 3,IMT
V1 (I,J,K,A)= V2 (I,J,K,A) + V3 (I,J,K,A)* D
U1 (I,J,K,A)= U2 (I,J,K,A) + U3 (I,J,K,A)* D
END DO
END DO
END DO
END DO
Modify it as follows: DO A = 1,N
DO K = 1,KM
DO J = 3,JMT
DO I = 3,IMT
V1 (I,J,K,A)= V2 (I,J,K,A) + V3 (I,J,K,A)* D
END DO
DO I = 3,IMT
U1 (I,J,K,A)= U2 (I,J,K,A) + U3 (I,J,K,A)* D
END DO
END DO
END DO
END DO
|
Reducing function calls in the loop |
Function calls exist in the loop and the compiler cannot perform vectorization. |
Extract calculations related to function calls out of the loop. |
Example code: for( int i = 0; i < len; i++ ) {
delta = -0.5 + (2*m+1)/(2.0*n);
vecA[k].dx = delta*length*cos(theta);
vecA[k].dy = delta*length*sin(theta);
k++;
}
Modify it as follows: // Extract math lib call outside the loop.
double cosNum = cos(theta);
double sinNum = sin(theta);
for( int i = 0; i < len; i++ ) {
delta = -0.5 + (2*m+1)/(2.0*n);
vecA[k].dx = delta*length*cosNum;
vecA[k].dy = delta*length*sinNum;
k++;
}
|
Using Fortran keywords |
The Fortran language feature is not fully used. |
Use array assignment instead of the loop to implement operations on multiple data records. |
Example code: do i = 1, maxI type1%array1(i)=array3(type1%array2(i)) enddo Modify it as follows: type1%array1=array3(type1%array2) |
Specifying that the memory to which the pointer points is not referenced by other pointers and adding compilation commands |
It cannot be determined whether the memory to which the pointer points is referenced by any other pointers. The compiler will abandon automatic vectorization. |
Add the restrict keyword to label the pointer variable. |
Example code: for (int i=0;i<len;++i) {
a[i] = b[index[i]];
}
Modify it as follows: void func(int *a, int *__restrict__ b,
int *index, int len)
{
#pragma clang loop vectorize(enable)
for (int i=0;i<len;++i) {
a[i] = b[index[i]];
}
}
|