Performing Configuration Before Installation
The optimization feature has been written into the patch file. After the patch file is applied to the source code, the feature is ready for use.
Using SVE to Accelerate Vector Computing
Prerequisites
The CPU supports SVE instruction optimization.
Check Method
Run the following command to check whether the CPU supports SVE instruction optimization:
lscpu
If the Flags line in the command output contains sve, the CPU supports SVE instruction optimization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 320 On-line CPU(s) list: 0-319 Vendor ID: HiSilicon BIOS Vendor ID: HiSilicon BIOS Model name: Kunpeng 920 7285Z Model: 0 Thread(s) per core: 2 Core(s) per socket: 80 Socket(s): 2 Stepping: 0x0 Frequency boost: disabled CPU max MHz: 3000.0000 CPU min MHz: 400.0000 BogoMIPS: 200.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc fla gm ssbs sb paca pacg dcpodp flagm2 frint svei8mm svef32mm svef64mm svebf16 i8mm bf16 dgh rng ecv Caches (sum of all): L1d: 10 MiB (160 instances) L1i: 10 MiB (160 instances) L2: 200 MiB (160 instances) L3: 280 MiB (4 instances) NUMA: NUMA node(s): 4 NUMA node0 CPU(s): 0-79 NUMA node1 CPU(s): 80-159 NUMA node2 CPU(s): 160-239 NUMA node3 CPU(s): 240-319 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Not affected Srbds: Not affected Tsx async abort: Not affected |
Compilation Option
During GCC compilation, the -march compilation option specifies the Arm architecture version and the extended instruction set. In this patch, SVE is specified through the following compilation options:
1
|
-march=armv8-a+sve -msve-vector-bits=256 |
The former indicates that the SVE instruction is used, and the latter specifies the number of bits of the SVE vector length.
Using PF to Accelerate Data Processing
In the cases of frequent cyclic operations, parallel computing, and high cache miss rate, PF can be used to accelerate data processing and improve system performance.
Hardware Prefetch
Hardware prefetch is to read instructions and data addresses to the cache in advance by tracing the changes of instructions and data addresses. You are advised to enable the prefetch function in the BIOS.
- Restart the server and enter the BIOS.
- In the BIOS, choose Advanced > MISC Config and press Enter.
- Set CPU Prefetching Configuration to Enabled, and press F10.
Software Prefetch
In the Arm architecture, the Prefetch Memory (PRFM) instruction can be used to prefetch data. The prefetch instruction loads data to the cache, but the data is not immediately used by the processor. The prefetched data is usually stored in the L1 data cache. If the L1 cache is full, data may be stored in the L2 cache or L3 cache (if any). In addition, the prefetching effect may be further optimized by adjusting the prefetch step.
The following describes the implementation of the prefetch instruction in C++.
1
|
#define PLDL1KEEP_OFF(ptr, off) __asm__ volatile("prfm PLDL1KEEP, [%0, #(%1)]"::"r"(ptr), "i"(off):)
|
In this feature, the prefetch instruction is added before the SVE load instruction to combine the two optimization methods.
