Consider manual parsing and handling of regex #4

PgBiel · 2024-04-27T02:14:40Z

The regex module currently just forwards all regex strings to Nix's built-in regex matching and splitting functions. However, Nix uses POSIX ERE syntax for regex, which is more limited than the syntax available in other Gleam targets. For example, \s for whitespace doesn't work, and must be replaced by [[:space:]]. Similarly, \d needs to be written as [0-9] for example.

We could solve this by manually parsing regex on compile and transforming into a format that Nix accepts. In particular, we'd look into tackling the following incompatibilities:

Properly handle \s and \d, which are widely used across Gleam packages;
Handle other Regex escapes, such as \n;
Handle case-insensitive flag;
Handle multi-line flag;
Handle (?!x) (negative lookahead);
Handle (?:x) (ignored group).

1 and 2 would require parsing and replacing depending on the context (inside/outside character classes - a simple global replacement isn't enough, since the [ ] in [0-9] have to be dropped when already inside a character class).

3 would basically consist of converting letters such as a or A into [aA], and would require parsing in the same manner.

4 could perhaps be done by splitting the string into lines first and joining matches on each line.

5 could perhaps be done by replacing (?!x) with (x)? and storing the capture group number; later on, when using match functions, matches where the group is present would be ignored (additionally, the group would be removed from submatches).

6 could be done by replacing (?:x) with (x) and then removing the group from submatches later, by storing the group's number after parsing.

Now, parsing with Nix could be inefficient, but we expect compile to be a "slower" operation anyway. This would allow proper compatibility with the ecosystem.

We could also test if parsing would be needed anyway first , otherwise just pass the regex through.

The text was updated successfully, but these errors were encountered:

PgBiel added enhancement New feature or request nix incompatibility Some function works differently or doesn't work in the Nix target labels Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider manual parsing and handling of regex #4

Consider manual parsing and handling of regex #4

PgBiel commented Apr 27, 2024

Consider manual parsing and handling of regex #4

Consider manual parsing and handling of regex #4

Comments

PgBiel commented Apr 27, 2024