Skip to content

Conversation

@federicotdn
Copy link
Contributor

@federicotdn federicotdn commented Apr 13, 2025

Hi! This PR adds support for using named capture groups in regexps (e.g. /(?<foo>abc)/).

I've simply wrapped the []int that multiple regexp-related functions were handling with a struct that adds groups information as well.

Small caveats:

  • The regexp parser in parser/regexp.go assumes that Go does not support the (/?<foo>abc)/ syntax, but this is not the case anymore: regexp/syntax: accept (?<name>...) in addition to (?P<name>...) golang/go#58458. So in theory, we could use standard Go regexp for regexps with named capture groups (from a specific Go version onwards, I guess). As it is now, the code will always create a regexp2.Regexp when it encounters something that looks like a named capture group. This works fine, in any case.
  • ...with the small problem that regexp2 exposes named captured groups in an interesting way. All groups without a name (e.g. (foo)) are automatically assigned an integer-as-a-string name, e.g. "1", depending on how many groups overall there are in the regexp (like an index). So when we ask for capture group names, there's no way of knowing if the the JS code actually named a group e.g. "2" explicitly, of if regexp2 just happened to assign this name to it. This is a problem because a group named "2" would not be valid in JS code, since "2" is not a valid ECMAScript identifier. For this PR, I have chose to ignore these groups, and assume that they were automatically given a name by regexp2. This means that we will accept JS code that should've been rejected instead.

I have also added a new script called extract_passed_tests.sh which can be used like this:

go test -v | ./extract_passed_tests.sh

I have used this test to compare if, between master branch and my branch, I have effectively added new passing tests, or not. This is the result:

--- p1.txt	2025-04-13 20:53:07.212235005 +0200
+++ p2.txt	2025-04-13 21:02:15.071912335 +0200
@@ -15,6 +15,7 @@
 test/annexB/built-ins/escape/to-string-observe.js
 test/annexB/built-ins/escape/unmodified.js
 test/annexB/built-ins/RegExp/incomplete_hex_unicode_escape.js
+test/annexB/built-ins/RegExp/named-groups/non-unicode-malformed-lookbehind.js
 test/annexB/built-ins/RegExp/prototype/compile/B.RegExp.prototype.compile.js
 test/annexB/built-ins/RegExp/prototype/compile/flags-string-invalid.js
 test/annexB/built-ins/RegExp/prototype/compile/flags-to-string-err.js
@@ -10873,6 +10874,16 @@
 test/built-ins/RegExp/lookBehind/sticky.js
 test/built-ins/RegExp/lookBehind/variable-length.js
 test/built-ins/RegExp/lookBehind/word-boundary.js
+test/built-ins/RegExp/named-groups/groups-object.js
+test/built-ins/RegExp/named-groups/groups-properties.js
+test/built-ins/RegExp/named-groups/lookbehind.js
+test/built-ins/RegExp/named-groups/non-unicode-property-names-invalid.js
+test/built-ins/RegExp/named-groups/non-unicode-references.js
+test/built-ins/RegExp/named-groups/string-replace-escaped.js
+test/built-ins/RegExp/named-groups/string-replace-nocaptures.js
+test/built-ins/RegExp/named-groups/string-replace-numbered.js
+test/built-ins/RegExp/named-groups/string-replace-unclosed.js
+test/built-ins/RegExp/named-groups/unicode-references.js
 test/built-ins/RegExp/prop-desc.js
 test/built-ins/RegExp/prototype/15.10.6.js
 test/built-ins/RegExp/prototype/dotAll/length.js
@@ -24838,6 +24849,52 @@
 test/language/literals/regexp/lastIndex.js
 test/language/literals/regexp/mongolian-vowel-separator-eval.js
 test/language/literals/regexp/mongolian-vowel-separator.js
+test/language/literals/regexp/named-groups/forward-reference.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-2.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-2-u.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-3.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-3-u.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-4.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-4-u.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-5.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname.js
+test/language/literals/regexp/named-groups/invalid-dangling-groupname-u.js
+test/language/literals/regexp/named-groups/invalid-empty-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-empty-groupspecifier-u.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-2.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-2-u.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-3.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-3-u.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-4.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-5.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-6.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname.js
+test/language/literals/regexp/named-groups/invalid-incomplete-groupname-u.js
+test/language/literals/regexp/named-groups/invalid-lone-surrogate-groupname.js
+test/language/literals/regexp/named-groups/invalid-non-id-continue-groupspecifier-4.js
+test/language/literals/regexp/named-groups/invalid-non-id-continue-groupspecifier-4-u.js
+test/language/literals/regexp/named-groups/invalid-non-id-continue-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-2.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-3.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-4.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-4-u.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-5.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-5-u.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-6.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-7.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-8.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-8-u.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-9-u.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-non-id-start-groupspecifier-u.js
+test/language/literals/regexp/named-groups/invalid-numeric-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-numeric-groupspecifier-u.js
+test/language/literals/regexp/named-groups/invalid-punctuator-starting-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-punctuator-starting-groupspecifier-u.js
+test/language/literals/regexp/named-groups/invalid-punctuator-within-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-punctuator-within-groupspecifier-u.js
+test/language/literals/regexp/named-groups/invalid-unterminated-groupspecifier.js
+test/language/literals/regexp/named-groups/invalid-unterminated-groupspecifier-u.js
 test/language/literals/regexp/regexp-first-char-no-line-separator.js
 test/language/literals/regexp/regexp-first-char-no-paragraph-separator.js
 test/language/literals/regexp/regexp-source-char-no-line-separator.js

which by my count, are 57 new passing tests 🎉
If you think the code in this PR looks good enough, I can add some Go tests if you think that would help.

@dop251
Copy link
Owner

dop251 commented Apr 17, 2025

Hi. Thanks for submitting. I think there are some changes required in regexp2 in order to match the current ECMAScript syntax.. I'm looking into this.

@federicotdn
Copy link
Contributor Author

Sounds good! I also was interested in having a wider discussion regarding regexp and Goja, should I maybe open an issue for that?

@federicotdn federicotdn closed this by deleting the head repository Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants