Porting Rust SSE Instructions

SSE instructions are Single Instruction Multiple Data (SIMD) instructions in x86. SSE instructions correspond to the Neon instructions in Kunpeng. A single SSE instruction can operate multiple pieces of data at the same time, accelerating the running speed. SSE/Neon intrinsic functions are a series of functions encapsulated by the compiler for high-level languages. After compilation, the SSE/Neon instruction sequence is generated to provide the same functions as SSE/Neon assembly instruction compilation, but leave register allocation to the compiler to make it easier for developers.

Porting SSE intrinsic functions is similar to that in C and C++. For details, see the AvxToNeon project. The difference lies in the call mode. When porting SSE intrinsic functions, set the target_feature attribute to neon and specify use std::arch::aarch64::*. Similar to the inline assembly, the code segment of the SSE intrinsic functions needs to be written in the scope of unsafe, and the +nightly option needs to be added during compilation.

Symptom:

When the source code contains x86 SSE intrinsic functions, "unresolved import 'std::arch::x86_64'" or "cannot find function '_mm_set1_epi8' in this scope" is displayed during compilation.

Example:

The following example shows how to port SSE intrinsic functions in Rust to the Kunpeng platform. The code segment uses a single instruction to add the data in the two vector registers channel by channel.

// Code implementation on the x86 platform
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "sse"))] 
fn add() 
{
    #[cfg(target_arch = "x86_64")]     
    use std::arch::x86_64::*;
    unsafe {
        let veca = _mm_set1_epi8(4,4,4,4);
        let vecb = _mm_set1_epi8(3,3,3,3);
        let result = _mm_add_epi8(veca, vecb);
    }
}
// Code implementation on the Kunpeng platform
#![feature(stdsimd)]
#[cfg(all(any(target_arch = "aarch64"), target_feature = "neon"))]
fn add()
{
    use std::arch::aarch64::*;
    unsafe {
        let veca = vdupq_n_s8(4);
        let vecb = vdupq_n_s8(3);
        let result = vaddq_s8(veca, vecb);
    }
}

Parent topic: Porting Rust Code