编译报错:unknown mnemonic 'movqu' -- 'movqu'。
movqu为x86指令集中的指令,在鲲鹏上无法使用。该指令可以实现寄存器到寄存器,寄存器到地址的数据拷贝。
x86上movqu指令用法有两种:
MOVDQU xmm1, xmm2/m128
MOVDQU xmm2/m128, xmm1
指令使用方法参考:
ld1指令:Load multiple 1-element structures to one, two, three or four registers
LD1 { Vt.T }, [Xn|SP]
可参考指令集手册的第9.98小节,下载地址:http://infocenter.arm.com/help/topic/com.arm.doc.dui0802a/DUI0802A_armasm_reference_guide.pdf
st1指令:Store multiple 1-element structures from one, two three or four registers.
ST1 { Vt.T }, [Xn|SP]
可参考指令集手册的第9.202小节,下载地址:http://infocenter.arm.com/help/topic/com.arm.doc.dui0802a/DUI0802A_armasm_reference_guide.pdf
如下是一个简单的示例:
/*x86调用*/ void add_x86_asm(int* result, int* a, int* b, int len) { __asm__("\n\t" "1: \n\t" "movdqu (%[a]), %%xmm0 \n\t" "movdqu (%[b]), %%xmm1 \n\t" "pand %%xmm0, %%xmm1 \n\t" "movdqu %%xmm1, (%[result]) \n\t" :[result]"+r"(result) //output :[a]"r"(a),[b]"r"(b),[len]"r"(len) //input :"memory","xmm0","xmm1" ); return; } /*鲲鹏调用*/ void and_neon_asm(int* result, int* a, int* b, int len) { int num = {0}; __asm__("\n\t" "1: \n\t" "ld1 {v0.16b}, [%[a]], #16 \n\t" "ld1 {v1.16b}, [%[b]], #16 \n\t" "and v0.16b, v0.16b, v1.16b \n\t" "subs %[len],%[len],#4 \n\t" "st1 {v0.16b}, [%[result]], #16 \n\t" "bgt 1b \n\t" :[result]"+r"(result) //output :[a]"r"(a),[b]"r"(b),[len]"r"(len) //input :"memory","v0","v1" ); return; }