Rate This Document
Findability
Accuracy
Completeness
Readability

Porting Go Assembly Instructions

Generally, code can be directly compiled and run after the compilation system is set up on the Kunpeng platform. However, if Go assembly files (*.s assembly files) exist, they need to be ported manually.

Symptom:

When the source code contains x86 Go assembly, "missing function body" is displayed during compilation.

Go registers include common registers and pseudo registers. The following describes the pseudo registers.

  • Pseudo register

    On the x86 platform, Go has four pseudo registers (FP, PC, SB, and SP), which are references to memory locations.

    • FP (frame pointer): stores parameters and local variables to identify function parameters and return values.
    • SP (stack pointer): stores the offset address of the stack. It points to the top of the local stack frame.
    • SB (static base): a static base pointer used to represent a global variable or function.
    • PC (program counter): a common PC register in a processor used to store the address of an instruction to be executed.
  • Assembly code porting

    The Go compiler outputs abstract assembly code that does not correspond to any real hardware architecture. Through pseudo-assembly, the Go assembler generates specific binary files for the target hardware. The advantage of pseudo-assembly is that it is easier to port Go to the new architecture. However, even in pseudo-assembly, there are some differences between the x86 and Kunpeng instructions. The following lists some common differences.

    • The length of the data to be transferred is determined by the suffix of the MOV. The instructions for transferring data of the same length vary on different platforms. Table 1 lists the differences between the x86 and Kunpeng instructions.
      Table 1 Comparison of x86 and Kunpeng instructions for transferring data of the same length

      Operand Length

      x86

      Kunpeng

      8 bytes

      MOVQ

      MOVD

      4 bytes

      MOVL

      MOVW

      2 bytes

      MOVW

      MOVH

      1 byte

      MOVB

      MOVB

    • On the Kunpeng platform, MOVD.W + stack address offset indicates the front increment, that is, stack scale-out, corresponding to SUBQ + stack address offset in x86. Similarly, MOVD.P + stack address offset indicates the back increment, that is, stack scale-in, corresponding to ADDQ + stack address offset in x86. MOVD.W and MOVD.P appear in pairs in the assembly code.

      Example code on the x86 platform:

      SUBQ     $24, SP      // The stack top pointer is moved down by 24 bytes.
      ……                  // A series of specific function assembly instructions are omitted in the middle.
      ADDQ    $24, SP       // The stack top pointer is moved upward by 24 bytes.

      Example code on the Kunpeng platform:

      MOVD.W   R30, -24(RSP)       // The stack top pointer is moved down by 24 bytes.
      ……                         // A series of specific function assembly instructions are omitted in the middle.
      MOVD.P    24(RSP), R30       // The stack top pointer is moved upward by 24 bytes.
    • Common calculation commands:

      Command Function

      x86

      Kunpeng

      Addition

      ADDQ

      ADD

      Subtraction

      SUBQ

      SUB

      Multiplication

      IMULQ

      MUL

      Comparison

      CMPQ

      CMP

      For more commands, visit the following links:

      https://www.slideshare.net/linaroorg/optimizing-golang-for-high-performance-with-arm64-assembly-sfo17314

Example:

In the following example, two integers are added using the Go assembly to show the differences between assembly implementations on different platforms.

Source code function:

func Add(a, b int64) int64 {
    return a+b
}
  • Go assembly implementation on the x86 platform:

    Create an add.go file, declare only functions in the file, and create an add_amd64.s file in the add.go directory. The implementation is as follows:

    TEXT    ·Add+0(SB), $0-24 
    MOVQ    a+0(FP), BX      // Copy the value of parameter a to the BX register.
    MOVQ    b+8(FP), BP      // Copy the value of parameter b to the BP register.
    ADDQ    BP, BX           // Add the values of the BP register and the BX register, and store the sum value to the BX register.
    MOVQ    BX, ret+16(FP)   // Copy the value of the BX register to the FP+16 position, that is, the location of the return value in memory.
    RET                      // Return the result of calling the RET interface.

    TEXT indicates the start. ·Add is in the format of {package}·{function}. If the function belongs to the current package, {package} can be omitted. Note that the middle dot (·) preceding Add is used to separate the function from the package name. In $0-24, 0 indicates that the frame size of the function stack is 0, and 24 indicates the size of the parameter and return value. The parameter is two int64 variables, and the return value is one int64 variable. There are 24 bytes in total.

  • Go assembly implementation on the Kunpeng platform:

    Create an add.go file, declare only functions in the file, and create an add_arm64.s file in the add.go directory. The implementation is as follows:

    TEXT    ·Add+0(SB), $0-24  // The explanation is similar to that of x86.
    MOVD    a+0(FP), R0
    MOVD    b+8(FP), R1
    ADD     R1, R0, R0
    MOVD    R0, ret+16(FP)
    RET

    Through comparison, it is found that simple Go assembly source code (such as the MOV instruction, ADD instruction, and register naming) is implemented differently on different platforms.

    Generally, there are a large number of complex instructions in files that contain the Go assembly, and the differences between architectures require a good command of the Go principle. When the time is limited and assembly conversion is difficult to implement, you can compile Go code (non-assembly) to implement the code function if you understand the implementation logic of assembly code.