GCC Vectorization
To use GCC automatic vectorization, you must use GCC 9.3.0 or later for compilation.
Perform the following operations:
- In the header file km.h:
- Add #pragma omp declare simd simdlen(4) notinbranch before the definition of the single-precision interface.
- Add #pragma omp declare simd simdlen(2) notinbranch before the definition of the double-precision interface.
- Add -lkm -lksvml -lm to the link option to provide vectorization interfaces, and add -fopenmp-simd -fno-math-errno -O3 to the compilation option. -O3 is used to enable vectorization optimization of the compiler, which is the prerequisite for the vectorization math library to take effect.
Example 1
km.h:
... #pragma omp declare simd simdlen(4) notinbranch float expf(float) ...
test.c:
#include <km.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
long loop = 1e7;
int len = 8192;
float *a = (float*)malloc(sizeof(float) * len);
float *b = (float*)malloc(sizeof(float) * len);
float *d = (float*)malloc(sizeof(float) * len);
for (int i = 0; i < len; i++) {
a[i] = rand() * 7.7680f - 6.3840f;
b[i] = rand() * 8.7680f - 6.3840f;
d[i] = 0;
}
for (int j = 0; j < len; j++) {
d[j] = expf(a[j]);
}
return 1;
}
Compile command:
gcc test.c -lkm -lksvml -lm -fopenmp-simd -fno-math-errno -O3
Run the nm command to check the called interface.
nm -D a.out

If the _ZGVnN4v_ prefix interface is displayed, the calling is successful.
Example 2
If the code contains deep nesting, for example, test.c:
#include <km.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
long loop = 1e7;
int len = 8192;
float *a = (float*)malloc(sizeof(float) * len);
float *b = (float*)malloc(sizeof(float) * len);
float *d = (float*)malloc(sizeof(float) * len);
for (int i = 0; i < len; i++) {
a[i] = rand() * 7.7680f - 6.3840f;
b[i] = rand() * 8.7680f - 6.3840f;
d[i] = 0;
}
for (int i = 0; i < loop; i++) {
for (int j = 0; j < len; j++) {
d[j] = expf(a[j]);
}
}
return 1;
}
Add an extra compilation option to prompt the compiler not to combine the outer loop and memory loop. The compile command is as follows:
gcc test.c -lkm -lksvml -lm -fopenmp-simd -fno-math-errno -O3 -fno-tree-loop-ivcanon -fno-loop-interchange
Run the nm command to check the called interface.
nm -D a.out

If the _ZGVnN4v_ prefix interface is displayed, the calling is successful.
Parent topic: Compiler Automatic Vectorization