Porting Rust SSE Instructions
SSE instructions are Single Instruction Multiple Data (SIMD) instructions in x86. SSE instructions correspond to the Neon instructions in Kunpeng. A single SSE instruction can operate multiple pieces of data at the same time, accelerating the running speed. SSE/Neon intrinsic functions are a series of functions encapsulated by the compiler for high-level languages. After compilation, the SSE/Neon instruction sequence is generated to provide the same functions as SSE/Neon assembly instruction compilation, but leave register allocation to the compiler to make it easier for developers.
Porting SSE intrinsic functions is similar to that in C and C++. For details, see the AvxToNeon project. The difference lies in the call mode. When porting SSE intrinsic functions, set the target_feature attribute to neon and specify use std::arch::aarch64::*. Similar to the inline assembly, the code segment of the SSE intrinsic functions needs to be written in the scope of unsafe, and the +nightly option needs to be added during compilation.
Symptom:
When the source code contains x86 SSE intrinsic functions, "unresolved import 'std::arch::x86_64'" or "cannot find function '_mm_set1_epi8' in this scope" is displayed during compilation.
Example:
The following example shows how to port SSE intrinsic functions in Rust to the Kunpeng platform. The code segment uses a single instruction to add the data in the two vector registers channel by channel.
// Code implementation on the x86 platform
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "sse"))]
fn add()
{
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
unsafe {
let veca = _mm_set1_epi8(4,4,4,4);
let vecb = _mm_set1_epi8(3,3,3,3);
let result = _mm_add_epi8(veca, vecb);
}
}
// Code implementation on the Kunpeng platform
#![feature(stdsimd)]
#[cfg(all(any(target_arch = "aarch64"), target_feature = "neon"))]
fn add()
{
use std::arch::aarch64::*;
unsafe {
let veca = vdupq_n_s8(4);
let vecb = vdupq_n_s8(3);
let result = vaddq_s8(veca, vecb);
}
}