Development of the SIMD Instruction Set in the ARM Architecture
The earliest SIMD instruction set supported by ARM is Armv6. The vector register on which the SIMD instruction set depends is the general register of ARM. It supports 8-bit or 16-bit integers to implement parallel computing of four 8-bit integers or two 16-bit integers. In the Armv7-A architecture, ARM further develops its own SIMD instruction set and names it NEON. This instruction set has 32 64-bit NEON vector registers and supports single-precision floating-point operations. In the Armv8-A architecture, the NEON instruction set is further developed, and the length of each of the 32 vector registers is increased to 128 bits. The NEON instruction set supports 8-bit, 16-bit, 32-bit, and 64-bit integers as well as single-precision and double-precision floating-point operations.
Although the NEON vector registers are 128 bits long, they can be used as 32-bit Sn registers or 64-bit Dn registers. Figure 1 shows the usage mode.
