NEON Intrinsics
NEON intrinsics depend on the implementation of the compiler. The functions of NEON intrinsics in Clang are the same as those in the official document Arm Neon Intrinsics Reference (ANIR document for short).
However, the optimization level must be set to a level higher than O0 to generate the assembly instructions specified in the ANIR document.
To obtain the ANIR document, visit https://developer.arm.com/documentation/ihi0073/g.
Usage Example
Take the following test.c as an example:
#include <arm_neon.h>
int32x2_t test_vsudot_lane_s32(int32x2_t r, int8x8_t a, uint8x8_t b) {
return vsudot_lane_s32(r, a, b, 0);
}
Intrinsic |
Argument Preparation |
Instruction |
Result |
Supported Architectures |
|---|---|---|---|---|
int32x2_t vsudot_lane_s32(int32x2_t r, int8x8_t a, uint8x8_t b, const int lane) |
r -> Vd.2S a -> Vn.8B b -> Vm.4B 0 <= lane <= 1 |
SUDOT Vd.2S,Vn.8B,Vm.4B[lane] |
Vd.2S -> result |
A32/A64 |
The clang -march=armv8.6-a+i8mm test.c -O0 -S command output is a combination of multiple instructions, such as mov, dup, and usdot instructions.
test_vsudot_lane_s32: // @test_vsudot_lane_s32
// %bb.0: // %entry
sub sp, sp, #112 // =112
str d0, [sp, #72]
str d1, [sp, #64]
str d2, [sp, #56]
ldr d0, [sp, #72]
str d0, [sp, #48]
ldr d0, [sp, #64]
str d0, [sp, #40]
ldr d0, [sp, #56]
str d0, [sp, #32]
ldr d0, [sp, #32]
str d0, [sp, #16]
ldr d0, [sp, #48]
ldr d1, [sp, #16]
// implicit-def: $q3
mov v3.16b, v1.16b
dup v1.2s, v3.s[0]
ldr d2, [sp, #40]
str d0, [sp, #104]
str d1, [sp, #96]
str d2, [sp, #88]
ldr d0, [sp, #104]
ldr d1, [sp, #96]
ldr d2, [sp, #88]
usdot v0.2s, v1.8b, v2.8b
str d0, [sp, #80]
ldr d0, [sp, #80]
str d0, [sp, #24]
ldr d0, [sp, #24]
str d0, [sp, #8]
ldr d0, [sp, #8]
add sp, sp, #112 // =112
ret
The clang -march=armv8.6-a+i8mm test.c -O1 -S command output is the same as that in the ANIR document.
test_vsudot_lane_s32: // @test_vsudot_lane_s32
// %bb.0: // %entry
// kill: def $d2 killed $d2 def $q2
sudot v0.2s, v1.8b, v2.4b[0]
ret