Rate This Document
Findability
Accuracy
Completeness
Readability

Introduction to SVE/SVE2

Scalable vector extension (SVE) is the next-generation single instruction, multiple data (SIMD) instruction set launched by Arm after Neon. The SVE instruction set allows a coding style with an unknown vector length, which can dynamically adapt to the vector length implemented by hardware. Compared with the 64-bit/128-bit vector length supported by Neon, the SVE vector length can be an integer multiple of 128 in the range of 128-bit to 2048-bit and must be a power of 2. SVE is designed for high-performance computing (HPC) scenarios and supports multiple reduction operations, gather/scatter, and per-lane predication. Therefore, compared with Neon, SVE supports more vectorization scenarios. By enabling the SVE instruction set, you can further explore parallel opportunities in application code. SVE2 is a superset of SVE. It supports more integer operations, enhancing support for machine vision, multimedia, and database scenarios. The VLS version is added for most digital signal processor (DSP) and Neon media processing instructions.

SVE contains 32 vector registers, Z0–Z31. The size of each vector register is an integer multiple of 128 bits and must be a power of 2. The lower 128 bits of an SVE vector register coincide with an SIMD vector/floating-point register, as shown in Figure 1. SIMD vectors correspond to the Neon instruction set. This allows SVE instructions to be used interchangeably with Neon instructions. A specific register can be written using SVE instructions and then read using Neon instructions, and vice versa. When Neon instructions or floating-point instructions are written to an SIMD vector/floating-point register, the corresponding SVE vector register is cleared. Data in a vector register is considered as several elements, and the length of a vector element may be 8-bit, 16-bit, 32-bit, or 64-bit.

Figure 1 SVE vector registers and SIMD vector/floating-point registers

SVE introduces the predicate register, which is used to determine whether the vector element of a vector register used together with the predicate register is valid. SVE includes 15 predicate registers, P0–P15. The length of a predicate register is one-eighth of the length of a vector register, that is, n x 16-bit. Each bit of a predicate register corresponds to a byte of a vector register. The element length of a predicate register can be 1-bit, 2-bit, 4-bit, or 8-bit. Even though the element length of a predicate register can reach 8-bit at most, only the smallest 1 bit in the element determines whether the vector element in the corresponding vector register is valid (bits[0]=1 indicates valid, and bits[0]=0 indicates invalid). A completely initialized predicate register can be used to control operations of vector registers with different element lengths, as shown in Figure 2. A predicate register (p0) can be used to control a vector register (z0) with packed 8-bit element length. It can also be used to control a vector register (z1) with packed 64-bit element length. In this case, bits of each element except the smallest 1 bit are meaningless. In the case where a loop contains both 64-bit and 32-bit operations, a predicate register (p0) may be used to control both a vector register (z1) with packed 64-bit element length and a vector register (z2) with unpacked 32-bit element length.

Figure 2 Example of using predicate registers

For details about the SVE instruction set, see SVE-SVE2-programming-examples.