An experiment in implementing static checks of regular expressions in Typescript using Brzozowski derivatives.
This is also an experiment in what an API for what a RegEx-validated string type might look like. There has already been discussion of the use cases and potential API in the Typescript RegEx-validated string types issue.
This allows you to make assertions about string constants:
import type { Compile, Exec, Recognize } from "brzozowski-ts/src";
type HexStrRE = Compile<"(?<hex>[0-9A-Fa-f]{5,})">;
type Match = Exec<HexStrRE, "abc123">;
const captures: Match["captures"] = ["abc123"];
const groups: Match["groups"] = { hex: "abc123" };
type HexStr<S extends string> = Recognize<HexStrRE, S>;
type NominalHex = string & { readonly isHex: unique symbol };
const castSpell = <S extends string>(hex: HexStr<S> | NominalHex) => hex;
const spell = castSpell("00dead00" as const); // ok!
const spellCheck: typeof spell = "00dead00"; // ok!
// @ts-expect-error
castSpell("xyz");
let dynamicHex: string = "a5df0";
castSpell(dynamicHex as NominalHex); // ok!RecognizePattern<RE, Str>uses a limited and naively-implemented regular expression language:- no lookaround:
- no positive or negative lookahead:
(?=no)(?!nope) - no positive or negative lookbehind:
(?<=no)(?<!nope)
- no positive or negative lookahead:
- matches always start from the start of the string:
/^always/ - string recognition is implemented as a series of potentially-nested commands rather than state transitions within a finite automaton.
- no flags:
- no case-insensitive matching:
/NOPE/i - no multiline mode:
nope$
- no case-insensitive matching:
- no lookaround:
- Using these types likely slows down builds
This is pre-alpha software: the API will change without warning, the implementation is brittle and incomplete, and none of this code has been optimized for memory usage or speed.
If you're brave, you can:
pnpm add -D "git+https://github.com/skalt/brzozowski-ts.git"and import the types like
import type { Compile, Exec } from "brzozowski-ts/src";Compile-time parsing follows this general algorithm:
- Given a constant string type
Sand a regular expression string typeR - take the derivative of
Rwith respect the start ofSto produce a shorter regular expressionrand a shorter strings - recur using
sandr - when
ris empty, the entire regular expression has been matched. - if
sis empty andris not, the expression has not been matched
These DFA-based pure-type RegEx implementations were an inspiration! brzozowski_ts adds the ability to compile regular expressions, but uses a naive backtracking algorithm based on Brzozowski derivatives rather than DFAs.