0.9.0
Pre-releaseMajor Rewrite of Tokenization/Parsing System
While the original tokenization system had a lot of room for extendability, it was also hampered by being stuck around regular expressions which didn't help performance or memory allocation. As part of #72, the complete rewrite was a ground-up new tokenization system designed with performance and allocations in mind.
To put into perspective how much better the new system:
RobotsFileParser.FromStream
: 95% faster, 94% fewer allocationsRobotsFileParser.FromString
: 92% faster, 94% fewer allocationsRobotsPageParser.FromRules
: 92% faster, 95% fewer allocations- Allowed-access checking is now 2-3x faster with zero allocations
The library also now cross-targets to .NET 6 as well as .NET Standard 2.0/2.1, allowing to take advantage of certain other performance-orientated APIs. You'll see the greatest uplift using Robots Exclusion Tools with .NET 6 and modern x86 hardware (due to custom SIMD use).
Potential Breaking Changes
For the core classes and methods like RobotsFileParser
and FromStream
etc, they should behave identically from a consuming perspective. The Can(partialRuleName, userAgent)
API has been removed from RobotsPageDefinition
- it is recommended to use HasRule(ruleName, userAgent)
. This isn't just a name difference but rather a core behavioural difference that made the Can()
variant some obscure and complicated.
As usual, if you hit any issues with this new release, please report an issue!