Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider manual parsing and handling of regex #4

Open
PgBiel opened this issue Apr 27, 2024 · 0 comments
Open

Consider manual parsing and handling of regex #4

PgBiel opened this issue Apr 27, 2024 · 0 comments
Labels
enhancement New feature or request nix incompatibility Some function works differently or doesn't work in the Nix target

Comments

@PgBiel
Copy link
Member

PgBiel commented Apr 27, 2024

The regex module currently just forwards all regex strings to Nix's built-in regex matching and splitting functions. However, Nix uses POSIX ERE syntax for regex, which is more limited than the syntax available in other Gleam targets. For example, \s for whitespace doesn't work, and must be replaced by [[:space:]]. Similarly, \d needs to be written as [0-9] for example.

We could solve this by manually parsing regex on compile and transforming into a format that Nix accepts. In particular, we'd look into tackling the following incompatibilities:

  1. Properly handle \s and \d, which are widely used across Gleam packages;
  2. Handle other Regex escapes, such as \n;
  3. Handle case-insensitive flag;
  4. Handle multi-line flag;
  5. Handle (?!x) (negative lookahead);
  6. Handle (?:x) (ignored group).

1 and 2 would require parsing and replacing depending on the context (inside/outside character classes - a simple global replacement isn't enough, since the [ ] in [0-9] have to be dropped when already inside a character class).

3 would basically consist of converting letters such as a or A into [aA], and would require parsing in the same manner.

4 could perhaps be done by splitting the string into lines first and joining matches on each line.

5 could perhaps be done by replacing (?!x) with (x)? and storing the capture group number; later on, when using match functions, matches where the group is present would be ignored (additionally, the group would be removed from submatches).

6 could be done by replacing (?:x) with (x) and then removing the group from submatches later, by storing the group's number after parsing.

Now, parsing with Nix could be inefficient, but we expect compile to be a "slower" operation anyway. This would allow proper compatibility with the ecosystem.

We could also test if parsing would be needed anyway first , otherwise just pass the regex through.

@PgBiel PgBiel added enhancement New feature or request nix incompatibility Some function works differently or doesn't work in the Nix target labels Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request nix incompatibility Some function works differently or doesn't work in the Nix target
Projects
None yet
Development

No branches or pull requests

1 participant