Skip to content

Conversation

@victorjulien
Copy link
Member

Replaces #11617

The bundled compare.py can be used to compare results from 2 branches or the spms in a single result csv file.

Here is the result of the new mm vs bm in
image
It generally performs a lot better, but it seems to have a slightly higher start up cost which shows in the tests that take the shortest time.

Deduplicate counter registration.
Rename to match coding style. Update callers.
Systems with SSE 4.1 as the highest SSE version are getting pretty
rare, so it's hard to test.
AVX2 implementation that compares 32 bytes at a time.

Rearrange code to make parts reusable.

Fall back to smaller SIMD for remaining buffer.

When (remaining) buffer is smaller than 32 bytes fall back to other
SIMD implementations that deal with 16 bytes of data per iteration.

Add 16/32/64 byte implementations using AVX512.
Implement for AVX512, AVX2 and SSE42.
Wrapper around `memmem`.

The case sensitive search is implemented by directly calling `memmem`.

As there is no case insensitieve variant available, a wrapper around
memmem is created, that takes a sliding window approach:

1. take a slice of the haystack
2. convert it to lowercase
3. search it using memmem
4. move window forward
Tool to benchmark detection engine content inspection, which is the
inspection of individual groups of content, etc matches for a buffer.

Also add a set of basic tests for the various single pattern matching
implementation.

Output is in csv. To files for the rule based tests. To stdout for the
spm tests.
To show differences betweeen 2 result files or between spm algos
in a single result file.
@codecov
Copy link

codecov bot commented Nov 8, 2025

Codecov Report

❌ Patch coverage is 67.49226% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.12%. Comparing base (6bd3605) to head (1b9c77e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14295      +/-   ##
==========================================
- Coverage   84.17%   84.12%   -0.06%     
==========================================
  Files        1012     1013       +1     
  Lines      261868   262201     +333     
==========================================
+ Hits       220421   220570     +149     
- Misses      41447    41631     +184     
Flag Coverage Δ
fuzzcorpus 63.18% <53.40%> (-0.14%) ⬇️
livemode 18.71% <36.05%> (-0.07%) ⬇️
pcap 44.55% <53.03%> (-0.10%) ⬇️
suricata-verify 64.86% <54.54%> (-0.06%) ⬇️
unittests 59.22% <68.81%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline = 28416

@suricata-qa
Copy link

WARNING:

field baseline test %
SURI_TLPR1_stats_chk
.uptime 654 634 96.94%

Pipeline = 28417

}

#if defined(__AVX2__)
static inline void MemcmpyToLowerAVX2(uint8_t *dst, const uint8_t *src, size_t n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memcpy?

@victorjulien victorjulien marked this pull request as draft November 10, 2025 10:44
@victorjulien
Copy link
Member Author

Going to split this up as not all parts are as useful. Esp the memcmp stuff is not always faster than the libc implementation, sometimes a lot slower. So that will need some more research.

@inashivb
Copy link
Member

Esp the memcmp stuff is not always faster than the libc implementation, sometimes a lot slower.

For smaller data set?

@victorjulien
Copy link
Member Author

The opposite actually. I can get better perf with large data (~9k) with avx512+loop unrolls, but between the various systems I have the results are inconsistent. With small data I think the start up cost of my code is somewhat better. But also, with small data it all matters somewhat less :)

Will focus on getting the mm and bench tool merged first, then we can also reason about further changes with the bench results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants