Skip to content

0.9.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 08 Aug 13:43
· 3 commits to refs/heads/main since this release

Major Rewrite of Tokenization/Parsing System

While the original tokenization system had a lot of room for extendability, it was also hampered by being stuck around regular expressions which didn't help performance or memory allocation. As part of #72, the complete rewrite was a ground-up new tokenization system designed with performance and allocations in mind.

To put into perspective how much better the new system:

  • RobotsFileParser.FromStream: 95% faster, 94% fewer allocations
  • RobotsFileParser.FromString: 92% faster, 94% fewer allocations
  • RobotsPageParser.FromRules: 92% faster, 95% fewer allocations
  • Allowed-access checking is now 2-3x faster with zero allocations

The library also now cross-targets to .NET 6 as well as .NET Standard 2.0/2.1, allowing to take advantage of certain other performance-orientated APIs. You'll see the greatest uplift using Robots Exclusion Tools with .NET 6 and modern x86 hardware (due to custom SIMD use).

Potential Breaking Changes

For the core classes and methods like RobotsFileParser and FromStream etc, they should behave identically from a consuming perspective. The Can(partialRuleName, userAgent) API has been removed from RobotsPageDefinition - it is recommended to use HasRule(ruleName, userAgent). This isn't just a name difference but rather a core behavioural difference that made the Can() variant some obscure and complicated.

As usual, if you hit any issues with this new release, please report an issue!