Skip to content

Releases: TurnerSoftware/RobotsExclusionTools

0.9.1

09 Aug 14:19
7f53fb4
Compare
Choose a tag to compare
0.9.1 Pre-release
Pre-release

Changes

🐛 Bug Fixes

  • Skip byte order mark when processing streams (#73)

0.9.0

08 Aug 13:43
Compare
Choose a tag to compare
0.9.0 Pre-release
Pre-release

Major Rewrite of Tokenization/Parsing System

While the original tokenization system had a lot of room for extendability, it was also hampered by being stuck around regular expressions which didn't help performance or memory allocation. As part of #72, the complete rewrite was a ground-up new tokenization system designed with performance and allocations in mind.

To put into perspective how much better the new system:

  • RobotsFileParser.FromStream: 95% faster, 94% fewer allocations
  • RobotsFileParser.FromString: 92% faster, 94% fewer allocations
  • RobotsPageParser.FromRules: 92% faster, 95% fewer allocations
  • Allowed-access checking is now 2-3x faster with zero allocations

The library also now cross-targets to .NET 6 as well as .NET Standard 2.0/2.1, allowing to take advantage of certain other performance-orientated APIs. You'll see the greatest uplift using Robots Exclusion Tools with .NET 6 and modern x86 hardware (due to custom SIMD use).

Potential Breaking Changes

For the core classes and methods like RobotsFileParser and FromStream etc, they should behave identically from a consuming perspective. The Can(partialRuleName, userAgent) API has been removed from RobotsPageDefinition - it is recommended to use HasRule(ruleName, userAgent). This isn't just a name difference but rather a core behavioural difference that made the Can() variant some obscure and complicated.

As usual, if you hit any issues with this new release, please report an issue!

0.8.1

13 Nov 12:23
849e98b
Compare
Choose a tag to compare
0.8.1 Pre-release
Pre-release

Changes

🧰 Maintenance

👨🏼‍💻 Contributors

@Turnerj, @dependabot-preview and @theolivenbaum

0.8.0

10 Jun 12:54
Compare
Choose a tag to compare
0.8.0 Pre-release
Pre-release

Allow control over the Robots.txt file access rules by @Turnerj (#67)

0.7.0

20 Nov 06:47
Compare
Choose a tag to compare
0.7.0 Pre-release
Pre-release

Fixes NullPointerException in PathMatch for paths that are only a wild card ("*")
Removes small source of allocations through PathComparisonUtility

0.6.0

31 Jul 01:37
332aae1
Compare
Choose a tag to compare
0.6.0 Pre-release
Pre-release

Fixes issue with cancellation token not always being used (#39)
Fixes issue with handling spacing in tokenization & how invalid tokens are handled (#40)

0.5.0

23 Jul 08:03
6bb830c
Compare
Choose a tag to compare
0.5.0 Pre-release
Pre-release

Added cancellation token support (#38)

Note: While the updated method signature has a default value, this will break any extensions of the tokenization system as the TokenizeAsync method has a new cancellation token parameter too.

0.4.0

13 Jun 16:33
6739930
Compare
Choose a tag to compare
0.4.0 Pre-release
Pre-release

Add case insensitivity for field names (eg. "User-agent") (#34, #35)
Performance optimizations (~15% faster, ~20% less allocations)

Note: There are some breaking changes around tokenization. If you're extending this library, pay attention to fb0ebb5 and c27d0fc.

If you just use the RobotsFileParser or RobotsPageParser classes, you shouldn't notice any changes.

0.3.0

15 Aug 13:50
Compare
Choose a tag to compare
0.3.0 Pre-release
Pre-release

Added Robots page parsing (#5)
Updated naming of RobotsParser and other classes (92e9e99) (Note: This is a breaking change)
Fixed bug with handling of token regex (4d279de)
Fixed bug with "Deny All" robots files (d94b1ac)

0.2.0

28 Dec 05:48
Compare
Choose a tag to compare
0.2.0 Pre-release
Pre-release

Allows passing in a HttpClient instance into the RobotsParser constructor