-
Notifications
You must be signed in to change notification settings - Fork 1.6k
MM SPM, bench tool and general Simd/v8 #14295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Deduplicate counter registration.
Rename to match coding style. Update callers.
Systems with SSE 4.1 as the highest SSE version are getting pretty rare, so it's hard to test.
AVX2 implementation that compares 32 bytes at a time. Rearrange code to make parts reusable. Fall back to smaller SIMD for remaining buffer. When (remaining) buffer is smaller than 32 bytes fall back to other SIMD implementations that deal with 16 bytes of data per iteration. Add 16/32/64 byte implementations using AVX512.
Implement for AVX512, AVX2 and SSE42.
Wrapper around `memmem`. The case sensitive search is implemented by directly calling `memmem`. As there is no case insensitieve variant available, a wrapper around memmem is created, that takes a sliding window approach: 1. take a slice of the haystack 2. convert it to lowercase 3. search it using memmem 4. move window forward
Tool to benchmark detection engine content inspection, which is the inspection of individual groups of content, etc matches for a buffer. Also add a set of basic tests for the various single pattern matching implementation. Output is in csv. To files for the rule based tests. To stdout for the spm tests.
To show differences betweeen 2 result files or between spm algos in a single result file.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #14295 +/- ##
==========================================
- Coverage 84.17% 84.12% -0.06%
==========================================
Files 1012 1013 +1
Lines 261868 262201 +333
==========================================
+ Hits 220421 220570 +149
- Misses 41447 41631 +184
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
Information: QA ran without warnings. Pipeline = 28416 |
|
WARNING:
Pipeline = 28417 |
| } | ||
|
|
||
| #if defined(__AVX2__) | ||
| static inline void MemcmpyToLowerAVX2(uint8_t *dst, const uint8_t *src, size_t n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memcpy?
|
Going to split this up as not all parts are as useful. Esp the memcmp stuff is not always faster than the libc implementation, sometimes a lot slower. So that will need some more research. |
For smaller data set? |
|
The opposite actually. I can get better perf with large data (~9k) with avx512+loop unrolls, but between the various systems I have the results are inconsistent. With small data I think the start up cost of my code is somewhat better. But also, with small data it all matters somewhat less :) Will focus on getting the |
Replaces #11617
The bundled compare.py can be used to compare results from 2 branches or the spms in a single result csv file.
Here is the result of the new

mmvsbminIt generally performs a lot better, but it seems to have a slightly higher start up cost which shows in the tests that take the shortest time.