Skip to content

Commit 420aea3

Browse files
committed
Also use query and fragment when matching URIs
When matching URIs against allow/disallow rules, the library previously used explicitly only the path part of the URI. Fixed it to use path, query and fragment.
1 parent aa0dc41 commit 420aea3

File tree

4 files changed

+24
-8
lines changed

4 files changed

+24
-8
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66

77
## [Unreleased]
88

9+
## [1.1.2] - 2025-01-27
10+
### Fixed
11+
- When matching URIs against allow/disallow rules, the library previously used explicitly only the path part of the URI. Fixed it to use path, query and fragment.
12+
913
## [1.1.1] - 2022-11-08
1014
### Fixed
1115
- The `Parser` now also trims hidden whitespace characters that aren't covered by PHP's `trim()` function by default. Such characters at the beginning of a line can cause parsing to fail, because it's important that user-agent and rule lines actually start with the corresponding keywords.

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2024 Christian Olear
1+
Copyright (c) 2025 Christian Olear
22

33
Permission is hereby granted, free of charge, to any person obtaining
44
a copy of this software and associated documentation files (the

src/RulePattern.php

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,11 @@ public function pattern(): string
2525
*/
2626
public function matches(string|Url $uri): bool
2727
{
28-
$path = $uri instanceof Url ? $uri->path() : Url::parse($uri)->path();
28+
$pathQueryFragment = $uri instanceof Url ? $uri->relative() : Url::parse($uri)->relative();
2929

30-
if (!is_string($path)) {
31-
return false;
32-
}
33-
34-
$path = Encoding::decodePercentEncodedAsciiCharactersInPath($path);
30+
$pathQueryFragment = Encoding::decodePercentEncodedAsciiCharactersInPath($pathQueryFragment);
3531

36-
return preg_match($this->preparedRegexPattern(), $path) === 1;
32+
return preg_match($this->preparedRegexPattern(), $pathQueryFragment) === 1;
3733
}
3834

3935
private function preparedRegexPattern(): string

tests/ParserTest.php

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,22 @@ public function test_parse_sitemap_lines(): void
357357
], $robotsTxt->sitemaps());
358358
}
359359

360+
public function test_it_uses_not_only_the_path_but_also_the_query_when_matching(): void
361+
{
362+
$robotsTxtContent = <<<ROBOTSTXT
363+
User-agent: *
364+
Disallow: /?foo
365+
ROBOTSTXT;
366+
367+
$robotsTxt = (new Parser())->parse($robotsTxtContent);
368+
369+
$this->assertFalse($robotsTxt->isAllowed('/?foo', 'MyBot'));
370+
371+
$this->assertFalse($robotsTxt->isAllowed('/?foo=bar', 'MyBot'));
372+
373+
$this->assertTrue($robotsTxt->isAllowed('/yo?foo=bar', 'MyBot'));
374+
}
375+
360376
/**
361377
* @param string[] $expected
362378
* @param RulePattern[] $actual

0 commit comments

Comments
 (0)