Skip to content

Commit 560d6d1

Browse files
Fixes "case-sensitive" URI matching for Disallow rules in robots.txt (#46)
* Fixes "case-sensitive" URI matching for Disallow rules in robots.txt Based on Issue #45 (Robots.txt "Disallow" URI matching should be case-sensitive) I removed the use of `strtolower` in `parseDisallow` to preserve the URI's case sensitivity. The issue was opened based on RFC standard by google which indicates: "The value of the disallow rule is case-sensitive." (Source: https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt?hl=en#disallow) --- I ran PHP-Unit and all tests passed since none were specifically testing case-sensitivity. I added test the_disallows_uri_check_is_case_sensitive to cover this issue. * Remove .idea files --------- Co-authored-by: Matthew Kesack <[email protected]>
1 parent 9533d45 commit 560d6d1

File tree

3 files changed

+11
-1
lines changed

3 files changed

+11
-1
lines changed

src/RobotsTxt.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ protected function parseUserAgent(string $line): string
253253

254254
protected function parseDisallow(string $line): string
255255
{
256-
return trim(substr_replace(strtolower(trim($line)), '', 0, 8), ': ');
256+
return trim(substr_replace(trim($line), '', 0, 8), ': ');
257257
}
258258

259259
protected function isDisallowLine(string $line): string

tests/RobotsTxtTest.php

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,15 @@ public function the_disallows_user_agent_check_is_case_insensitive()
149149
$this->assertFalse($robots->allows('/no-agents', strtolower('UserAgent007')));
150150
}
151151

152+
/** @test */
153+
public function the_disallows_uri_check_is_case_sensitive()
154+
{
155+
$robots = RobotsTxt::readFrom(__DIR__.'/data/robots.txt');
156+
157+
$this->assertFalse($robots->allows('/Case-Sensitive/Disallow'));
158+
$this->assertTrue($robots->allows(strtolower('/Case-Sensitive/Disallow')));
159+
}
160+
152161
/** @test */
153162
public function it_can_handle_multiple_user_agent_query_strings()
154163
{

tests/data/robots.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Disallow: /nl/admin/
77
Disallow: /en/admin/*
88
Disallow: /fr/admin$
99
Disallow: /es/admin-disallow/
10+
Disallow: /Case-Sensitive/Disallow
1011
User-agent: google
1112

1213
Disallow: /

0 commit comments

Comments
 (0)