Replacing the _mm_extract_ps Function
This function is used to load the data in channels 0, 1, 2, and 3 from the results to channels a, b, c, and d.
For details about _mm_extract_ps, see Intrinsics Guide.
- Code on x86:
inline void DoExtractM128(__m128i results, uint32_t *a, uint32_t *b, uint32_t *c, uint32_t *d) { *a = _mm_extract_ps((__v4sf)results, 0); *b = _mm_extract_ps((__v4sf)results, 1); *c = _mm_extract_ps((__v4sf)results, 2); *d = _mm_extract_ps((__v4sf)results, 3); } - Alternative for Kunpeng processors:
#include <arm_neon.h> inline void DoExtractM128(int8x16_t results, uint32_t *a, uint32_t *b, uint32_t *c, uint32_t *d) { *a = vgetq_lane_u32(vreinterpretq_u32_s8(results), 0); *b = vgetq_lane_u32(vreinterpretq_u32_s8(results), 1); *c = vgetq_lane_u32(vreinterpretq_u32_s8(results), 2); *d = vgetq_lane_u32(vreinterpretq_u32_s8(results), 3); }
Parent topic: Source Code Modification Cases