识别某些性能表现差的 ldp/stp,将其拆分成2个 ldr 和 str。
使用-fsplit-ldp-stp选项使能优化。
注:依赖-O1及以上优化等级。
测试用例如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | int __RTL (startwith ("split_complex_instructions")) simple_ldp_after_store () { (function "simple_ldp_after_store" (insn-chain (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 228 (set (reg/i:DI sp) (reg/i:DI x0))) (cinsn 238 (set (reg/i:DI x1) (reg/i:DI x0))) (cinsn 101 (set (mem/c:DI (plus:DI (reg/f:DI sp) (const_int 8))[1 S4 A32])(reg:DI x0))) (cinsn 10 (parallel [ (set (reg:DI x29) (mem:DI (plus:DI (reg/f:DI sp) (const_int 8)) [1 S4 A32])) (set (reg:DI x30) (mem:DI (plus:DI (reg/f:DI sp) (const_int 16)) [1 S4 A32]))])) (cinsn 102 (set (mem/c:DI (plus:DI (reg/f:DI x1) (const_int -16)) [1 S4 A32]) (reg:DI x0))) (cinsn 11 (parallel [ (set (reg:DI x3) (mem:DI (plus:DI (reg/f:DI x1) (const_int -16)) [1 S4 A32])) (set (reg:DI x4) (mem:DI (plus:DI (reg/f:DI x1) (const_int -8)) [1 S4 A32])) ])) (cinsn 103 (set (mem/c:DI (reg/f:DI x1) [1 S4 A32]) (reg:DI x0))) (cinsn 12 (parallel [ (set (reg:DI x5) (mem:DI (reg/f:DI x1) [1 S4 A32])) (set (reg:DI x6) (mem:DI (plus:DI (reg/f:DI x1) (const_int 8)) [1 S4 A32])) ])) (cinsn 13 (use (reg/i:DI sp))) (cinsn 14 (use (reg/i:DI cc))) (cinsn 15 (use (reg/i:DI x29))) (cinsn 16 (use (reg/i:DI x30))) (cinsn 17 (use (reg/i:DI x0))) (cinsn 18 (use (reg/i:DI x3))) (cinsn 19 (use (reg/i:DI x4))) (cinsn 20 (use (reg/i:DI x5))) (cinsn 21 (use (reg/i:DI x6))) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain ) ;; function "simple_ldp_after_store" } |
测试命令:
1 | gcc -O1 -fsplit-ldp-stp -S test.c -o test.s |
相比选项未打开时,选项打开后,生成的汇编代码指令不存在ldp指令,而是拆分为两个ldr指令。