Rate This Document
Findability
Accuracy
Completeness
Readability

Hyperscan

Hyperscan is an open source high-performance regular expression matching library developed by Intel. It is optimized using x86 vector instructions. Huawei has adapted and optimized Hyperscan for the Arm instruction set on the Kunpeng platform and innovated algorithms (for example, a hybrid model with a short-rule bypass) to address the weaknesses of rulesets used in actual data distribution services.

The Hyperscan operates in two main phases: compilation and runtime. During the compilation phase, one or more rules are compiled into a read-only database. The runtime phase does not require loading the rules; instead, it uses this pre-compiled database for pattern matching.

Hyperscan is the fastest in ruleset matching scenarios with less than 100,000 rules. It is widely used in scenarios such as application classification, IDP, and web application firewall (WAF). In typical open source solutions such as Suricata, the detection engine is a top computing hotspot. Optimizing Hyperscan and integrating it into such solutions can effectively improve the end-to-end performance of data distribution services.

Key technologies of Hyperscan:

efficient pre-filtering algorithm, multi-pattern matching algorithm, and automata algorithm.

Application scenarios of Hyperscan:

Network security data distribution, fine-grained data distribution for carriers, data distribution for public security and technical investigation, IDP, WAF, and foundation model application firewalls.

Short-Rule Bypass Technology

A rule set containing short rules can cause performance bottlenecks in Hyperscan rule matching. The rule sets deployed on the customer's live network usually have such bottlenecks, either from manually written short rules or short strings in regular rules decomposed into short rules by the graph partitioning algorithm. After short rules enter the multi-pattern matching, their high matching rate frequently interrupts the frontend fast instruction pipeline, triggering relatively slow backend exact verification. The hybrid processing of short and common rules introduces redundant operations, resulting in performance bottlenecks.

The optimized Hyperscan provides a hybrid model with a short-rule bypass, which significantly improves the matching performance for rule sets containing short rules. The model separates short rules from common rules and uses a few vector instructions to implement a high-speed bypass algorithm. This eliminates redundant operations and maintains the instruction pipeline parallel efficiency of the main algorithm, greatly improving the overall matching performance.

Short rules lead to poor performance in Suricata due to excessive vector calculations and exact checks during matching. Research has shown that simply removing eight single-byte rules from one subset can improve overall matching performance by 50%. To address this, the latest version introduces the hybrid model with a short-rule bypass. The model excludes short rules from the main algorithm using an efficient bypass algorithm enhanced by TBL lookup table instructions. While this approach adds 20% more overhead for short rule matching, it ultimately boosts overall performance by over 30%.

The high-speed short-rule bypass algorithm shuffle-and plays a key role in efficiently processing short rules in a rule set.

The following describes how the shuffle-and algorithm works. Assume that a rule set contains eight 1-byte rules expressed as {A, B, C, D, c, d, e, f}. Each rule is assigned a group, and the group ID of each rule is represented by a bit. The group IDs of the rules are {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80}. The entire ASCII table is expressed as a 16 x 16 grid, where characters with the same high 4-bit value are placed in the same row, and characters with the same low 4-bit value are placed in the same column. Then, enter the group ID of each rule in the corresponding grid. See the following figure.

Perform a bitwise OR operation on the values in each row of the table to obtain the vector shown in the second column of the preceding figure, which is recorded as maskHigh. Perform a bitwise OR operation on the values in each column of the table to obtain the vector shown in the second row of the preceding figure, which is recorded as maskLow. The following figure shows maskLow and maskHigh.

After the preceding preparations are complete, maskLow and maskHigh are used as source vectors for the shuffling operation, and the data to be matched is used as control vectors.

Assume that a 16-byte data block in the data to be matched is AgggDd3366666666. You can obtain the matching results of all positions in five instructions. First, express the lower four bits and upper four bits of each byte in the data block as two control vectors. This requires an AND operation to erase the upper four bits of each byte and a shift operation to move the upper four bits of each byte to the lower four bits. Then, you can obtain two vectors inputLow and inputHigh.

Then, two shuffling operations are performed to obtain the results shufLow and shufHigh as follows:

Perform an AND operation on shufLow and shufHigh to obtain the matching result of each position.

As shown in the preceding figure, offsets 0, 4, and 5 match group 0x01 (A), group 0x08 (D), and group 0x20 (d), respectively.

False-Positive Blocking Technology

The rule sets deployed on the customer's live network sometimes contain a few rules with special fragments, causing excessive false positives in multi-pattern matching. This triggers a large number of interpreter calls and inefficient long-rule verification, but yields zero true matches. These unnecessary interpreter calls become computing hotspots, undermining the pre-filtering capability of multi-pattern matching.

In a real customer traffic environment, a few rules with bad string fragments in the core rule set generated over 5 million false hits in multi-pattern matching. Interpreter calls and complete long-rule verification became the computing hotspot, with no actual matches found. However, after these rules were removed, the hotspot was eliminated and the matching performance was boosted by 20 times.

Figure 1 False-positive blocking

The optimized Hyperscan provides a false-positive blocking model, which greatly improves the matching performance for rule sets that contain bad string fragments.

The false-positive blocking model controls the behavior of the graph partitioning algorithm during the compilation phase. It checks bad fragments on candidate partitions in the NFA graph to intercept the partitions belonging to bad fragments. This prevents bad fragments from entering multi-pattern matching, thereby reducing false hits and filtering out meaningless verifications. In addition, it leverages vector instructions to optimize inefficient long-rule verification algorithms within the interpreter during the runtime phase. Through the preceding two measures, it significantly improves the overall performance.