Just another day at the office, you write a .NET Regex like a boss,
and suddenly realize that you need to declare, say, a non-capturing group.
It's (?:pattern), right? Wait, or was it (?=pattern)? No no, (?=pattern)
must be a positive lookahead or something. But if (?<=pattern) is a positive lookbehind,
then maybe positive lookahead would be (?>=pattern)?
"Aaargh! Now where's that Regex cheat sheet?.." And make sure to share it with your five colleagues who might be maintaining this code later. Also, remember to use comments inside the Regex pattern, and maybe a few third-party tools to be sure what the expression does.
"How did it come to this?"
Inspired by the Expression Trees feature in .NET, the RegexBuilder library provides a more verbose but more human-readable way of declaring regular expressions, using a language friendly to the .NET world instead of two lines of cryptic mess.
When it might be useful:
- When the expressions are complex and might be frequently changed.
- When you can tolerate 20 lines of understandable code instead of 1 hardly understandable.
- If you can spare a bit of CPU time and memory for constructing the Regex object for the sake of readability.
The library targets netstandard2.0, so it works with .NET Framework 4.6.1+, .NET Core 2.0+, and .NET 5+.
Tests are verified against .NET 8 and .NET 10 LTS.
Let's say you want to make a simple HTML parser
and capture the value of every href attribute from hyperlinks, like shown in the MSDN example.
The usual way:
Regex hrefRegex = new Regex("href\\s*=\\s*(?:[\"'](?<Target>[^\"']*)[\"']|(?<Target>\\S+))", RegexOptions.IgnoreCase);With RegexBuilder:
const string quotationMark = "\"";
Regex hrefRegex = RegexBuilder.Build
(
RegexOptions.IgnoreCase,
// Regex structure declaration
RegexBuilder.Literal("href"),
RegexBuilder.MetaCharacter(RegexMetaChars.WhiteSpace, RegexQuantifier.ZeroOrMore),
RegexBuilder.Literal("="),
RegexBuilder.MetaCharacter(RegexMetaChars.WhiteSpace, RegexQuantifier.ZeroOrMore),
RegexBuilder.Alternate
(
RegexBuilder.Concatenate
(
RegexBuilder.NonEscapedLiteral(quotationMark),
RegexBuilder.Group
(
"Target",
RegexBuilder.NegativeCharacterSet(quotationMark, RegexQuantifier.ZeroOrMore)
),
RegexBuilder.NonEscapedLiteral(quotationMark)
),
RegexBuilder.Group
(
"Target",
RegexBuilder.MetaCharacter(RegexMetaChars.NonwhiteSpace, RegexQuantifier.OneOrMore)
)
)
);See CustomRegexTests.cs for more examples.
RegexBuilder currently supports all regular expression language elements except substitution/replacement patterns.
The following elements are supported:
- Quantifiers
- Character escapes
- Character classes
- Anchors (atomic zero-width assertions)
- Grouping constructs
- Backreference constructs
- Alternation constructs
- Inline options and comments
Install via the .NET CLI:
dotnet add package RegexBuilderOr use the NuGet Package Manager Console:
PM> Install-Package RegexBuilderThere are 3 classes you'll need. They all expose their functionality via static members and work statelessly.
RegexBuilder: a factory class that produces and glues together different parts of a regular expression.RegexQuantifier: produces quantifiers (?,+{4,}, etc.) for regex parts that support them.RegexMetaChars: named constants for character classes (word boundary, whitespace, tab, etc.).
Start with var regex = RegexBuilder.Build(...); and replace ... with the parts of your regular expression
by calling the corresponding methods of RegexBuilder.
Build the solution:
dotnet build src/YuriyGuts.RegexBuilder.slnRun the tests (runs on both net8.0 and net10.0):
dotnet test --project src/YuriyGuts.RegexBuilder.TestsRun the console test app:
dotnet run --project src/YuriyGuts.RegexBuilder.TestAppThe project has GeneratePackageOnBuild=true, so a .nupkg file is produced on every build.
The source code is licensed under The MIT License.