Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tv_imdb: update to use new data source #17

Open
knowledgejunkie opened this issue Feb 27, 2018 · 4 comments
Open

tv_imdb: update to use new data source #17

knowledgejunkie opened this issue Feb 27, 2018 · 4 comments

Comments

@knowledgejunkie
Copy link
Contributor

In late December 2017, the IMDB mirror went read-only; the URL of the archived data changed to ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/.

From the README:

"IMDb datasets, providing bulk-access to IMDb title and name data, are now available from us via an HTTPS link.

https://datasets.imdbws.com/

As a previous ftp user you can just switch to https, however there are some formatting changes within the data.

For details on the new file formats and access guidelines, see www.imdb.com/interfaces."

In addition to being served over https, the data files on IMDB's new service have some formatting changes.

@knowledgejunkie
Copy link
Contributor Author

Updated base URL in IMDB.pm to use archived data (December 2017) in f0140f3

This is only a short-term workaround until we migrate tv_imdb to the new data source.

@jnylen
Copy link
Contributor

jnylen commented Feb 28, 2018

Also,
One way is using omdbapi and let the user add the APIKey.

@honir
Copy link
Contributor

honir commented Jul 29, 2020

Sadly, it looks like tv_imdb -- in its current form -- is a dead duck. The IMDb dataset is no longer in the public domain.

The ftp files haven't been updated since Dec 22 2017, and won't be updated in future. IMDb are no longer releasing updates to these files.

The expectation from IMDb is that people switch to using the new TSV (tab separated values) files available on https. However these files contain a very much reduced dataset, and many key elements are no longer available (no plot summaries, mpaa ratings, keywords, only 3 genres, only the top 3 actors, etc.).

The ethos as stated by IMDb is:

"The sets of data we provide are updated to only include the essential ones that help with matching and linking to an IMDb title or name."

In other words, the intention of the new datasets is that you are only to use them to identify the key to access the page on their website, and no more. Hence no rich dataset like we've used for the past 20-odd years. The marketing reasons for this should be obvious if you've visited imdb.com lately: it's like the old days with auto-playing videos, clickbait, massive adverts, etc.

I think we need to look at alternatives, such as the APIs from TMDb (The Movie Database) or OMDb (The Open Movie Database).

@honir
Copy link
Contributor

honir commented Jul 29, 2020

It looks like OMDb is no longer maintained. It uses IMDb data and reading some of the support tickets suggests it probably uses a database built from the no-longer maintained .list files (hint= people can't find programmes shown after 2017)

And TMDb has 567,000 films compared to 6,500,000 on IMDb :-(

@knowledgejunkie knowledgejunkie changed the title Update tv_imdb to use new data source tv_imdb: update to use new data source Aug 20, 2020
@honir honir removed the bug label Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants