Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript regex flavor #2299

Open
slevithan opened this issue Jun 21, 2024 · 8 comments
Open

JavaScript regex flavor #2299

slevithan opened this issue Jun 21, 2024 · 8 comments

Comments

@slevithan
Copy link

slevithan commented Jun 21, 2024

Flavor Request

I have a biased interest as the creator of the new regex library (which is a spiritual successor to the broadly-used XRegExp library which has been around since ES3 days), but still I think it would be significantly beneficial for JavaScript programmers, both people who are already using the regex library and people who would discover there is a better and more readable way to write native JavaScript regexes.

From its readme:

regex is a template tag that extends JavaScript regular expressions with features that make them more powerful and dramatically more readable. It returns native RegExp instances that equal or exceed native performance. It's also lightweight, supports all ES2024+ regex features, and can be used as a Babel plugin to avoid any runtime dependencies or added runtime cost.

Highlights include support for free spacing and comments, atomic groups via (?>…) that can help you avoid ReDoS, subroutines via \g<name> that enable powerful composition, and context-aware interpolation of regexes, escaped strings, and partial patterns.

With the regex package, JavaScript steps up as one of the best regex flavors alongside PCRE and Perl, and maybe surpassing C++, Java, .NET, and Python.

Implementation should be straightforward compared to any other new regex flavor because it's a lightweight library that runs on the client, and its features are a strict superset of native JavaScript regexes with flag v enabled. All of its extended features are already available in flavors that regex101 supports.

Following is a complete list of the changes that would be needed to support the JavaScript regex flavor, compared to the existing support for the "ECMAScript (JavaScript)" flavor:

  • Disable or remove flag u (unicode), since flag v is always enabled.
  • Flag v (vnicode) is always enabled.
  • Flag x (extended) is available and always enabled.
    • Whitespace (specifically space and tab) is also ignored within character classes). This works like PCRE's flag xx (which regex101 doesn't yet list as an option) and Java's x flag (which is supported by regex101, although Java allows any whitespace in character classes with flag x).
  • Flag n (non-capturing) is available and always enabled.
    • Works the same as flag n in .NET (which regex101 already supports as an option) and PCRE (where regex101 doesn't yet list it as an option).
    • Tooltip and explanation for syntax (…) should describe it as a non-capturing group, which is already supported by regex101 for the .NET flavor with flag n enabled.
    • Highlight all numbered backreferences (\1, etc.) as errors, since numbered backreferences to named groups are disabled by the always-on flag n (different from .NET, but the same as Ruby and C++ with its flag n equivalent nosubs).
    • Don't show numbered groups in the "Match Information" card; only named groups (due to flag n).
  • Highlight syntax (?>…) as an atomic group.
    • Works the same as (?>…) in PCRE, Java, and .NET, where regex101 already supports atomic groups.
  • Highlight syntax \g<name> as a subroutine.
    • Works the same as \g<name> in PCRE, where regex101 already supports subroutines. Edit: The only difference is it can't be used recursively.
  • Change the delimiter options to only include `.

There is detailed documentation for all of these features in the regex package's readme, and of course I would be happy to help however I can.

@firasdib
Copy link
Owner

Thank you for the detailed suggestion!

All the requirements you have listed are already supported by regex101 (some flags need to be set using (?xx) or (?n) etc), so this won't be an issue from the implementation perspective.

However, I'm on the fence if I should (and can) support this in a meaningful way.

  1. I'm afraid this may cause a lot of confusion (how would the flavor even be named?)
  2. The features described in the readme, particularly those under Context-aware and safe interpolation, would be hard to implement, as the site only has one regex input field.

This looks like an amazing programmatic aid for regexes in javascript, but perhaps not super well suited as a flavor, and the implementation on the site may not do it justice.

What do you think?

@slevithan
Copy link
Author

slevithan commented Jul 20, 2024

Thanks for the thoughtful response, @firasdib. Those are very reasonable concerns, and of course no hard feelings if you feel that it's not the right fit.

Some thoughts in response:

I'm afraid this may cause a lot of confusion (how would the flavor even be named?)

Perhaps something like "ECMAScript library: regex" or "Library: regex (JavaScript)". These aren't perfect, but naming flavors is also potentially tricky with other regex libraries not built into programming languages that could be supported in the future. E.g. if you ever wanted to support Rust's regex_lite or fancy-regex.

Another option would be to introduce a new concept of variations, libraries, packages, or similar that can be selected within particular flavor options. In this world, the flavor "ECMAScript (JavaScript)" might default to a "RegExp" suboption, but have a way to switch to "Library: regex". Then, the two existing PCRE flavor options could be collapsed the same way, potentially with the top level name "PCRE", which would default to suboption "PCRE2 (PHP >= 7.3)". This might also open the door to more easily adding support for things like Rust's regex_lite in the future.

The features described in the readme, particularly those under Context-aware and safe interpolation, would be hard to implement, as the site only has one regex input field.

I'd imagined that regex101 would totally ignore interpolation, since that is a feature of the regex function/tag and the JavaScript language itself, rather than part of the flavor (interpolation occurs before regex parsing). Supporting interpolation would be appropriate for a dedicated REPL for the regex library, but IMO not for a more general regex tester like regex101.

@firasdib
Copy link
Owner

These are good ideas. I've injured my shoulder, so can't try things out right now, but I will return to this as soon as I am back in business.

@slevithan
Copy link
Author

Awesome--I'll be happy to help however I can.

Note: Since regex101 would need to call regex with dynamic input rather than using it with backticks as a template tag, that would work like this: regex({raw: [<pattern-string>]}) or regex(<flags-str>)({raw: [<pattern-string>]}).

@slevithan
Copy link
Author

Heads up that regex 4.0.0 was just released, and it includes a few things that would be relevant for regex101.

  • The new subclass: true option should always be used by regex101, to avoid any issues with numbered backreferences used outside of the regex (e.g. in replacement strings). It's explained in the readme's Options section, collapsed under "See details for each option".
  • It's now possible to disable the implicit (on-by-default) flags x, n, and v (disabling v switches to u). However, it might be easier to just not offer this feature in regex101. Disabling them is done via options, e.g. disable: {n: true, x: true}.
  • Subroutine definition groups (regex101's term is subpattern definition constructs) are now supported at the end of regexes.

@firasdib
Copy link
Owner

firasdib commented Aug 7, 2024

@slevithan Thank you for the update. Super exciting development happening, I'll be sure to deep dive into it in the near future.

@slevithan
Copy link
Author

slevithan commented Aug 16, 2024

Just a note that regex 4.1 added support for possessive quantifiers. That completes all syntax I've wanted to add (and what I consider to be the most important features missing from JS), so I expect regex's syntax and flags to be stable now and not change for some time.

@slevithan
Copy link
Author

To help with challenges around referring to the library/flavor (which I acknowledge was a significant issue!), I've just adopted the official name Regex+ (with regex now being an alias, based on the package and tag name). It could now be listed as e.g. "Regex+ (JavaScript)".

Regex+ has also seen significant increased adoption since we last talked. 😊

npm downloads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants