Replacing the x86 pshufb Instruction
Symptom
Error: unknown mnemonic 'pshufb' -- 'pshufb'
Cause
pshufb (Packed Shuffle Bytes) is used to perform a hash operation on the first operand based on the control mask specified by the second operand to generate a combination number. It is an x86 assembly instruction and cannot be used on the Kunpeng platform. Usage of pshufb on the x86 platform:
pshufb xmm1, xmm2/m128
Procedure
The SSE intrinsic function corresponding to the pshufb instruction is _mm_shuffle_epi8. Replace pshufb on the Kunpeng platform as follows:
- Replace the pshufb assembly instruction with the SSE intrinsic instruction.Example code on the x86 platform:
__asm__("pshufb %1, %0" : "+x" (mmdesc) : "xm" (shuf_mask));Replace it with the SSE-intrinsic function._mm_shuffle_epi8(mmdesc, shuf_mask);
- Port the inline SSE function _mm_shuffle_epi8. GCC does not provide the corresponding Kunpeng platform version. Therefore, the corresponding function needs to be implemented.
FORCE_INLINE __m128i _mm_shuffle_epi8(__m128i a, __m128i b) { uint8x16_t tbl = vreinterpretq_u8_m128i(a); uint8x16_t idx = vreinterpretq_u8_m128i(b); uint8_t __attribute__((aligned(16))) mask[16] = {0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F}; uint8x16_t idx_masked = vandq_u8(idx, vld1q_u8(mask)); return vreinterpretq_m128i_u8(vqtbl1q_u8(tbl, idx_masked)); }
Parent topic: Embedded Assemblies