Rate This Document
Findability
Accuracy
Completeness
Readability

Optimization Description

This section describes the optimization of Faiss for the Kunpeng platform, including dimension-interleaved lookup and accumulation, vectorization, and vector filtering and compression.

Dimension-interleaved Lookup and Accumulation

The LUT accumulation operator is a critical hotspot operator in inverted index and exhaustive scan, often causing computational bottlenecks. Widening distance accumulation requires additional registers, which reduces the degree of instruction unrolling and introduces redundant computational overhead.

To address this, in-memory data layout is reordered to fully utilize the 256-bit wide registers. This approach minimizes temporary register overhead, increases the degree of instruction unrolling, and eliminates redundant computations (bit-width extension). By reducing the usage of 16 registers, pipeline utilization is improved and computational latency is lowered.

Vector Filtering and Compression

The filtering and compression process involves numerous intermediate steps when calculating bitmaps, creating a bottleneck. A large portion of the intermediate data is invalid, leading to low average utilization of register bit-width.

This optimization leverages the Scalable Vector Extension (SVE) predicate and the 256-bit register width feature to bypass redundant intermediate steps.