-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCRE "single-line mode" not properly represented in CTRE #282
Comments
I think there is code that tries to achieve this in ctre::evaluate, but it is hidden behind multi-line mode. |
Here is a simplified example of the std::regex behavior on a string containing '\r' or '\n'. |
It seems like ECMAScript is the only flavor available to |
By default, the Here is a useful website which explains how the dot character works. In short, there is this flag called "single-line" (or sometimes "dotall") which makes the dot actually match line breaks. I usually use something like |
I see, so this is a quirk exclusive to Perl-Compatible Regular Expressions. I think CTRE makes the mistake of assuming multi-line mode is the opposite of single-line mode, like this website says, as I found in the source code while making PR #283 that multi-line mode is what enables the behavior of never matching '\r' or '\n' for CTRE. |
Oh dear, this documentation you linked says PCRE is supposed to allow configuring which characters are line endings. So my PR isn't really PCRE valid, now it just matches the std::regex behavior. This is a complicated topic. |
I want to preface this with the fact that I am quite inexperienced with regular expressions, so I may be wrong about some things.
When I created issue #281, the example I linked for CTRE used a
ctre::multiline_starts_with
. This was because it was a simplified snippet from a personal project I am attempting to convert to using CTRE. I intended to usectre::starts_with
, as that is the direct analogue for the std::regex mode I was using before. However,ctre::starts_with
consistently caused stack overflow crashes. I have now discovered, through trial and error, why this was.STL: https://godbolt.org/z/vP9YqGP3v
CTRE: https://godbolt.org/z/bedTY8jxo
I do not know how to describe, it, but it seems regular expressions of various flavors (when not in multi-line mode) have special rules for the '\n' and '\r' characters that CTRE does not follow. I found a website that helps support this claim: https://regex101.com/r/Syt781/1. Notice that the regex behaves identically in ECMAScript, PCRE, and PCRE2 modes. I say it is a special rule for these characters in particular because other characters, including escape sequences like '\a', do still result in the greedy capture going too far with std::regex: https://godbolt.org/z/1cj3KqMas.
The text was updated successfully, but these errors were encountered: