Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikipedia importance dump not available (HTTP error 403) #2929

Open
FaFre opened this issue Dec 19, 2022 · 6 comments
Open

Wikipedia importance dump not available (HTTP error 403) #2929

FaFre opened this issue Dec 19, 2022 · 6 comments

Comments

@FaFre
Copy link

FaFre commented Dec 19, 2022

According to the documentation there should be an Wikipedia importance dump available under the following address: https://nominatim.org/data/wikimedia-importance.sql.gz

However, there seems nothing available. Is there an error with the webserver, or is this due to an outdated documentation?
https://nominatim.org/release-docs/latest/admin/Import/#downloading-additional-data

@lonvia
Copy link
Member

lonvia commented Dec 19, 2022

An inconsiderate user has maxed out traffic allowances on the server. The download will remain disabled until appropriate rate limiting is in place.

@FaFre
Copy link
Author

FaFre commented Dec 19, 2022

Oh thats very unfortunate. Is there any mirror available?

@mtmail
Copy link
Collaborator

mtmail commented Dec 19, 2022

@FaFre I have a copy on https://downloads.opencagedata.com/public/wikimedia-importance.sql.gz (not a mirror so don't hardcode in scripts)

@lonvia
Copy link
Member

lonvia commented Dec 19, 2022

The data is back now. Please check your scripts and restrict the download of extra data to the necessary minimum. The alternative is severe rate limiting on the server and nobody really wants that.

@lonvia lonvia closed this as completed Dec 19, 2022
@lonvia
Copy link
Member

lonvia commented Jan 1, 2023

Things are not better. There is a script circulating which downloads the 300MB wikipedia importance file once every minute using curl. Be advised that curl will be banned from the server by tomorrow.

@lonvia lonvia reopened this Jan 1, 2023
@lonvia lonvia pinned this issue Jan 1, 2023
@lonvia lonvia changed the title Wikipedia importance dump not available Wikipedia importance dump not available (HTTP error 403) Jan 1, 2023
@lonvia
Copy link
Member

lonvia commented Jan 1, 2023

On second thought, I'm not really willing to pay for another TB of data traffic over night. curl is now banned from https://nominatim.org/data effective immediately.

If you want to use curl to download data, use curl -A and set a custom user agent that identifies your application. If you are a provider of Nominatim installation scripts make absolutely sure that this user agent can be used to get in contact with you in case you screw up your script.

Furthermore, if you regularly check for updates of any data files, make sure to use the -z option or similar to avoid redownloading when there is no new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants