Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ordered_match() function #26

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

leni536
Copy link
Contributor

@leni536 leni536 commented Oct 30, 2018

I added an ordered_match() function that can be used to compare targets to all possible matches for a given subject. This relationship is encoded in the ctre::partial_ordering type which can be equal, less, greater or unordered. This can be useful for efficient searching in ordered collections of strings.

Usage:

constexpr auto re = "abc[1-3]"_ctre;
constexpr auto ord = ctre::partial_ordering(re.ordered_match("aaa"));
static_assert(ord == ctre::partial_ordering::less);

I only made a handful of manual tests and I didn't find any target, subject pair where it returns an unexpected result. It sure needs lots of testing though.

@hanickadot
Copy link
Owner

What should be ordering of subject "B" against regex "A|C"?

@leni536
Copy link
Contributor Author

leni536 commented Oct 30, 2018

unordered

This pull request relied on my original refactoring of atoms_characters.hpp. I will resolve this when you merge the other one.

@hanickadot
Copy link
Owner

So for ordered_match I need to check all branches? If I understand it correctly...
if it match the subject => equal
if not and all fails are after (encoded as numbers) a not matched char => less than
if not and all fails are before (encoded as numbers) a not matched char => greater than
if it`s neither of above => unordered

??

@leni536
Copy link
Contributor Author

leni536 commented Oct 30, 2018

I rebased the branch on the other refactoring method, now it merges without conflict.

This is how I understand how it works with an example: Suppose we have the target "[a-c]+". By default all three of less, greater or equal are potential return values. By evaluating the stack we eliminate possible return values when it's appropriate. For example if the first character of the subject is 'b' then we can eliminate both greater and less at that point since there are potential matches starting with both a and c. Then when the next character is for example 'x' then the result is unordered since we eliminated both less and greater, and now equal.

I think my approach in some rare cases can return unordered when it could potentially return less or greater. I don't think it's a correctness issue, unordered only means that it's not a match. However when we return less or greater then it's guaranteed that the subject compare less or greater to all possible matches.

Edit:
Also the only case where my approach increases complexity compared to match() is handling possessive repeats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants