Add initial support for regexp named groups #663
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi! This PR adds support for using named capture groups in regexps (e.g.
/(?<foo>abc)/).I've simply wrapped the
[]intthat multiple regexp-related functions were handling with a struct that adds groups information as well.Small caveats:
parser/regexp.goassumes that Go does not support the(/?<foo>abc)/syntax, but this is not the case anymore: regexp/syntax: accept (?<name>...) in addition to (?P<name>...) golang/go#58458. So in theory, we could use standard Goregexpfor regexps with named capture groups (from a specific Go version onwards, I guess). As it is now, the code will always create aregexp2.Regexpwhen it encounters something that looks like a named capture group. This works fine, in any case.regexp2exposes named captured groups in an interesting way. All groups without a name (e.g.(foo)) are automatically assigned an integer-as-a-string name, e.g."1", depending on how many groups overall there are in the regexp (like an index). So when we ask for capture group names, there's no way of knowing if the the JS code actually named a group e.g. "2" explicitly, of ifregexp2just happened to assign this name to it. This is a problem because a group named "2" would not be valid in JS code, since "2" is not a valid ECMAScript identifier. For this PR, I have chose to ignore these groups, and assume that they were automatically given a name byregexp2. This means that we will accept JS code that should've been rejected instead.I have also added a new script called
extract_passed_tests.shwhich can be used like this:I have used this test to compare if, between
masterbranch and my branch, I have effectively added new passing tests, or not. This is the result:which by my count, are 57 new passing tests 🎉
If you think the code in this PR looks good enough, I can add some Go tests if you think that would help.