-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregating adult performers metadata - authority file / schema discussion #10
Comments
Query for pulling external identifiers from wikipedia / wikidata (credit Tweeticoats - discord):
|
For collaborating on performers authority file - think the easiest way to proceed will be to share scraped performers data using torrents or file-hosting sites. Meta-data can be packed in to json. We'll then have Extract/Transform/Load script pull this files and transform them in to a usable dataset (perform cross-referencing/normalization/validation), so anybody can replicate the process without relying on any central host. List of source would get periodically expanded with new sites and updates from existing sites. |
Currently stashbox supports only single "source of truth" for scenes/performers/studios, where as performer data aggregated from various sources (index sites, tubes, social media, studios) may dither with varying degree of confidence
This is a proposal to create authority file that will:
a. Assign confidence value to performer matches across sources - link and de-dup performers
b. Assign confidence value to metadata and de-dup
c. Generate output scenes/performers/studios dump
There is a discussion regarding adding that functionality to stash-box itself https://discord.com/channels/559159668438728723/798641040029777980/894662081830322206
Whether this will be integrated in to stashbox, or kept separate - we need to come up with a schema, so wanted to start this discussion.
The text was updated successfully, but these errors were encountered: