-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: prevent search engines to index historical versions #6517
Docs: prevent search engines to index historical versions #6517
Conversation
docs/source/robots.txt
Outdated
Allow: /*/latest/ | ||
Allow: /en/latest/ # Fallback for bots that don't understand wildcards | ||
Allow: /*/stable/ | ||
Allow: /en/stable/ # Fallback for bots that don't understand wildcards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you copied this from the astro project, but are we sure this syntax is correct? The url for AiiDA's documentation on RTD starts with /projects/aiida-core/en/stable
. Should these rules include the /projects/aiida-core/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right!
yes, assuming RTD will place it to the right place: https://aiida.readthedocs.io/robots.txt
The paths inside this file are relative to the site domain, so:
Allow: /*/latest/ | |
Allow: /en/latest/ # Fallback for bots that don't understand wildcards | |
Allow: /*/stable/ | |
Allow: /en/stable/ # Fallback for bots that don't understand wildcards | |
Allow: /projects/aiida-core/en/latest/ | |
Allow: /projects/aiida-core/en/stable/ |
In any case, from the build, I now realize RTD places this file in a wrong place:
https://aiida.readthedocs.io/projects/aiida-core/robots.txt
This makes it discoverable by search engines...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright!
One solution is to set an exact redirect in RTD web setting tab,
from /robots.txt
to /projects/aiida-core/robots.txt
@sphuber I don't have access to RTD settings, may I ask you for this? if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sphuber I don't have access to RTD settings, may I ask you for this? if you agree.
I have gone through the admin panel, but cannot find a way to customize the path. It says in the docs that ReadTheDocs automatically generates and serves. We can set older versions to hidden so that their paths are automatically included in the robots.txt to be excluded from indexing. Would that be a better approach as we are sure that the robots.txt is put in the correct place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, hiding versions might not be ideal because some users may actually want to look at older version for some reason. We just don't want them to be indexed. And never mind, the docs also say what should be done for projects using Sphinx as we are, and it seems your approach is correct.
df43725
to
0a6e426
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @khsrali I have validated the robots.txt
with this tool https://technicalseo.com/tools/robots-txt/ and it seems to be correct. So let's merge this and see howm it goes.
#6517) Currently, all versions of the documentation are indexed with the result that google searches come up with very outdated versions and the latest version is almost impossible to find. The `robots.txt` now disallows any path from being indexed except for the `latest` and `stable` versions of the documentation. Cherry-pick: 5c1f5d6
aiidateam#6517) Currently, all versions of the documentation are indexed with the result that google searches come up with very outdated versions and the latest version is almost impossible to find. The `robots.txt` now disallows any path from being indexed except for the `latest` and `stable` versions of the documentation.
Fixes #6516 ,
I suggest to merge this immediately if the tests has passed, and build is successful.
The only way to know if this resolves the issue is to wait a few days and see if Google indexes are updated.