Rate This Document
Findability
Accuracy
Completeness
Readability

NEON Intrinsics

NEON intrinsics depend on the implementation of the compiler. The functions of NEON intrinsics in Clang are the same as those in the official document Arm Neon Intrinsics Reference (ANIR document for short).

However, the optimization level must be set to a level higher than O0 to generate the assembly instructions specified in the ANIR document.

To obtain the ANIR document, visit https://developer.arm.com/documentation/ihi0073/g.

Usage Example

Take the following test.c as an example:

#include <arm_neon.h>
int32x2_t test_vsudot_lane_s32(int32x2_t r, int8x8_t a, uint8x8_t b) {
  return vsudot_lane_s32(r, a, b, 0);
}
Table 1 Description of vsudot_lane_s32 in the ANIR document

Intrinsic

Argument

Preparation

Instruction

Result

Supported

Architectures

int32x2_t vsudot_lane_s32(int32x2_t r, int8x8_t a,

uint8x8_t b, const int lane)

r -> Vd.2S

a -> Vn.8B

b -> Vm.4B

0 <= lane <= 1

SUDOT Vd.2S,Vn.8B,Vm.4B[lane]

Vd.2S -> result

A32/A64

The clang -march=armv8.6-a+i8mm test.c -O0 -S command output is a combination of multiple instructions, such as mov, dup, and usdot instructions.

test_vsudot_lane_s32:                   // @test_vsudot_lane_s32
// %bb.0:                               // %entry
        sub     sp, sp, #112            // =112
        str     d0, [sp, #72]
        str     d1, [sp, #64]
        str     d2, [sp, #56]
        ldr     d0, [sp, #72]
        str     d0, [sp, #48]
        ldr     d0, [sp, #64]
        str     d0, [sp, #40]
        ldr     d0, [sp, #56]
        str     d0, [sp, #32]
        ldr     d0, [sp, #32]
        str     d0, [sp, #16]
        ldr     d0, [sp, #48]
        ldr     d1, [sp, #16]
                                        // implicit-def: $q3
        mov     v3.16b, v1.16b
        dup     v1.2s, v3.s[0]
        ldr     d2, [sp, #40]
        str     d0, [sp, #104]
        str     d1, [sp, #96]
        str     d2, [sp, #88]
        ldr     d0, [sp, #104]
        ldr     d1, [sp, #96]
        ldr     d2, [sp, #88]
        usdot   v0.2s, v1.8b, v2.8b
        str     d0, [sp, #80]
        ldr     d0, [sp, #80]
        str     d0, [sp, #24]
        ldr     d0, [sp, #24]
        str     d0, [sp, #8]
        ldr     d0, [sp, #8]
        add     sp, sp, #112            // =112
        ret

The clang -march=armv8.6-a+i8mm test.c -O1 -S command output is the same as that in the ANIR document.

test_vsudot_lane_s32:                   // @test_vsudot_lane_s32
// %bb.0:                               // %entry
                                        // kill: def $d2 killed $d2 def $q2
        sudot   v0.2s, v1.8b, v2.4b[0]
        ret