Rate This Document
Findability
Accuracy
Completeness
Readability

Porting Rust SSE Instructions

SSE instructions are Single Instruction Multiple Data (SIMD) instructions in x86. SSE instructions correspond to the Neon instructions in Kunpeng. A single SSE instruction can operate multiple pieces of data at the same time, accelerating the running speed. SSE/Neon intrinsic functions are a series of functions encapsulated by the compiler for high-level languages. After compilation, the SSE/Neon instruction sequence is generated to provide the same functions as SSE/Neon assembly instruction compilation, but leave register allocation to the compiler to make it easier for developers.

Porting SSE intrinsic functions is similar to that in C and C++. For details, see the AvxToNeon project. The difference lies in the call mode. When porting SSE intrinsic functions, set the target_feature attribute to neon and specify use std::arch::aarch64::*. Similar to the inline assembly, the code segment of the SSE intrinsic functions needs to be written in the scope of unsafe, and the +nightly option needs to be added during compilation.

Symptom:

When the source code contains x86 SSE intrinsic functions, "unresolved import 'std::arch::x86_64'" or "cannot find function '_mm_set1_epi8' in this scope" is displayed during compilation.

Example:

The following example shows how to port SSE intrinsic functions in Rust to the Kunpeng platform. The code segment uses a single instruction to add the data in the two vector registers channel by channel.

// Code implementation on the x86 platform
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "sse"))] 
fn add() 
{
    #[cfg(target_arch = "x86_64")]     
    use std::arch::x86_64::*;
    unsafe {
        let veca = _mm_set1_epi8(4,4,4,4);
        let vecb = _mm_set1_epi8(3,3,3,3);
        let result = _mm_add_epi8(veca, vecb);
    }
}
// Code implementation on the Kunpeng platform
#![feature(stdsimd)]
#[cfg(all(any(target_arch = "aarch64"), target_feature = "neon"))]
fn add()
{
    use std::arch::aarch64::*;
    unsafe {
        let veca = vdupq_n_s8(4);
        let vecb = vdupq_n_s8(3);
        let result = vaddq_s8(veca, vecb);
    }
}