Rate This Document
Findability
Accuracy
Completeness
Readability

Replacing the x86 pshufb Instruction

Symptom

Error: unknown mnemonic 'pshufb' -- 'pshufb'

Cause

pshufb (Packed Shuffle Bytes) is used to perform a hash operation on the first operand based on the control mask specified by the second operand to generate a combination number. It is an x86 assembly instruction and cannot be used on the Kunpeng platform. Usage of pshufb on the x86 platform:

pshufb xmm1, xmm2/m128

Procedure

The SSE intrinsic function corresponding to the pshufb instruction is _mm_shuffle_epi8. Replace pshufb on the Kunpeng platform as follows:

  1. Replace the pshufb assembly instruction with the SSE intrinsic instruction.
    Example code on the x86 platform:
    __asm__("pshufb %1, %0" : "+x" (mmdesc) : "xm" (shuf_mask));
    Replace it with the SSE-intrinsic function.
    _mm_shuffle_epi8(mmdesc, shuf_mask);
  2. Port the inline SSE function _mm_shuffle_epi8. GCC does not provide the corresponding Kunpeng platform version. Therefore, the corresponding function needs to be implemented.
    FORCE_INLINE __m128i _mm_shuffle_epi8(__m128i a, __m128i b)
    {
        uint8x16_t tbl = vreinterpretq_u8_m128i(a);
        uint8x16_t idx = vreinterpretq_u8_m128i(b);
        uint8_t __attribute__((aligned(16))) mask[16] = {0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F,
    0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F, 0x8F};
        uint8x16_t idx_masked = vandq_u8(idx, vld1q_u8(mask));
        return vreinterpretq_m128i_u8(vqtbl1q_u8(tbl, idx_masked));
    }