You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on a fork, I have found another solution:
Adding two additional leading slashes if the pattern starts with "//" ensures that urlparse does not confuse the first folder with the hostname (netloc). At the same time, path is as expected (e.g.):
When analyzing the following robots.txt, Protego parses the directive Disallow: //debug/* as if it was /*
This is due to the following line of code:
protego/src/protego.py
Line 185 in 45e1948
The problem is that urlparse does not parse the URL as expected (i.e. as a path) and takes "debug" as the hostname:
According to Google's official documentation, the Allow and Disallow directives must be followed by relative paths starting with a / character.
Therefore, I see two possible solutions:
Option 1
As is:
protego/src/protego.py
Lines 185 to 186 in 45e1948
To be:
Option 2
Add a re.sub at the beginning of the following method:
protego/src/protego.py
Lines 90 to 93 in 45e1948
The text was updated successfully, but these errors were encountered: