Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: how to stop search engines indexing outdated version of aiida #6516

Open
khsrali opened this issue Jul 4, 2024 · 19 comments · Fixed by #6517
Open

Docs: how to stop search engines indexing outdated version of aiida #6516

khsrali opened this issue Jul 4, 2024 · 19 comments · Fixed by #6517
Assignees

Comments

@khsrali
Copy link
Contributor

khsrali commented Jul 4, 2024

Right now, if you google for example "aiida tab completion" top results are from version 0.12.1 and 1.0.1.

Note: this issue doesn't only concern about an specific search, but in general to understand how to tell google stop indexing, historical version.

I found this PR on github that they could fix it for their own repo:
unitaryfund/mitiq#1526
which apparently there is a robots.txt to be updated/inserted, which btw, they also copied from this repo:)
https://github.com/astropy/astropy/blob/main/docs/robots.txt

P.S. unrelated to issue #6508

@khsrali
Copy link
Contributor Author

khsrali commented Jul 8, 2024

Thanks a lot @sphuber for merging #6517,
the file is now placed on this https://aiida.readthedocs.io/projects/aiida-core/en/latest/robots.txt url, which still is not discoverable by search engines. Should be https://aiida.readthedocs.io/robots.txt .

See here, they override only if robots.txt is defined in the RTD default. Since we added this in our latest, is not going to override.
Do you think it's feasible to change our RTD default to latest?

@khsrali khsrali reopened this Jul 8, 2024
@sphuber
Copy link
Contributor

sphuber commented Jul 8, 2024

See here, they override only if robots.txt is defined in the RTD default. Since we added this in our latest, is not going to override.
Do you think it's feasible to change our RTD default to latest?

No we can't unfortunately. We decided some time ago that we do not want to have latest as the default because if we update the docs to change or add new features, as long as the release is not made, the default documentation is "incorrect". So we decided to switch to "stable". I was simply going to make a patch release soon so that the stable docs get updated.

@khsrali
Copy link
Contributor Author

khsrali commented Jul 8, 2024

Makes sense, patch release is even better!
Then I close now, if the issue persist later we can reopen.

@khsrali khsrali closed this as completed Jul 8, 2024
@khsrali
Copy link
Contributor Author

khsrali commented Aug 7, 2024

Checking after release v2.6.2:
https://aiida.readthedocs.io/robots.txt is still the old version.
New version is located here:
https://aiida.readthedocs.io/projects/aiida-core/en/stable/robots.txt
which is not discoverable by search engines.

We need to understand how to move it to the right place.

@khsrali khsrali reopened this Aug 7, 2024
@eimrek eimrek self-assigned this Aug 22, 2024
@khsrali
Copy link
Contributor Author

khsrali commented Sep 19, 2024

https://support.google.com/webmasters/answer/7489871?hl=en#zippy=%2Cthis-is-my-site
As google says, this happens when robots.txt is not discoverable
Screenshot_20240919_154706

@khsrali
Copy link
Contributor Author

khsrali commented Sep 19, 2024

Historical versions should not be indexed:
Screenshot_20240919_154851

@khsrali
Copy link
Contributor Author

khsrali commented Sep 19, 2024

Should have not been indexed if robots.txt was discoverable:
Screenshot_20240919_155155

@khsrali
Copy link
Contributor Author

khsrali commented Sep 19, 2024

Screenshot_20240919_160124-1

@eimrek
Copy link
Member

eimrek commented Sep 23, 2024

Investigated this a bit.

Current readthedocs setup is the following:

  • we have an aiida readthedocs project that does not seem to have any contents itself and the build is failing (https://readthedocs.org/projects/aiida/).
  • aiida-core readthedocs is set up as a subproject of aiida, such that it's served under https://aiida.readthedocs.io/projects/aiida-core/en/stable/
  • modifying robots.txt setup of aiida-core doesn't update the robots.txt of the aiida.readthedocs.io domain, however.
  • we could try to modify robots.txt directly of the aiida project, and that might work.

But I think just having aiida-core as the main project is simpler and makes more sense. Considering also that we don't have any other subprojects. We can also rename aiida-core -> aiida for readthedocs. Therefore, i suggest the following:

  • We remove/delete the current aiida readthedocs project.
  • We set aiida-core as the main project, and potentially changing its "readthedocs name" to aiida. Then the documentation is served at https://aiida.readthedocs.io/en/stable/
  • current robots.txt modification will work directly.
  • If, in the future, we want other readthedocs "subprojects" to be included, we can just include them under this main project.

@khsrali would this work?

@eimrek
Copy link
Member

eimrek commented Sep 23, 2024

Note, a possible (huge) drawback of what i described above: I think there are direct links to aiida documentation on the internet in the form of "https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/index.html". if we simplify the url, these will get broken.

pinging also @giovannipizzi

@khsrali
Copy link
Contributor Author

khsrali commented Sep 24, 2024

Thanks a lot @eimrek for writing this up. Your suggestion actually makes sense to me.
A bigger issue is that aiida as in readthedocs is building from this repo: https://github.com/aiidateam/aiida-metapkg
Which is archived 🙃 so we cannot update that.

Also aiida-core apparently is meant to be a sub-project of aiida in readthedocs. Honestly I'm not aware if we change that what are the consequences, some you already mentioned.

One suggestion could be, we set a redirect in aiida readthedocs
from:
https://aiida.readthedocs.io/robots.txt
to:
https://aiida.readthedocs.io/projects/aiida-core/en/stable/robots.txt

Probably this would solve it.

@eimrek
Copy link
Member

eimrek commented Sep 24, 2024

@khsrali ok, if you're able to get it to work, that's fine.

@khsrali
Copy link
Contributor Author

khsrali commented Sep 24, 2024

@khsrali ok, if you're able to get it to work, that's fine.

Nah, actually redirect didn't work 😭
I've put it there, but still is going to the old robots.txt, I don't really understand.

@khsrali
Copy link
Contributor Author

khsrali commented Oct 8, 2024

@eimrek, yet https://aiida.readthedocs.io/robots.txt is the old one 😭

@eimrek
Copy link
Member

eimrek commented Oct 8, 2024

@khsrali Ok, i think i figured it out now. readthedocs built two versions: 'latest' from master and 'stable' from latest tag. The default version was set to 'stable', so https://aiida.readthedocs.io/robots.txt reflected the old, tagged version. I currently just set the default version to 'latest' and now the robots seems to be correct. Feel free to close.

@khsrali
Copy link
Contributor Author

khsrali commented Oct 8, 2024

For me, still showing the old one: 🤔

image

@eimrek
Copy link
Member

eimrek commented Oct 8, 2024

Strange. did you try to clear cache / use incognito / use other browser?

@khsrali
Copy link
Contributor Author

khsrali commented Oct 9, 2024

Ah, right! indeed it was a caching issue.
Cheers! now robots.txt is in the right place.
Let's wait a few days to see if google actually does update the indexes.

@khsrali
Copy link
Contributor Author

khsrali commented Oct 9, 2024

Thanks a lot @eimrek !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants