Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import the Ponify music archive #100

Closed
4 of 10 tasks
EventideGlow opened this issue Aug 2, 2016 · 0 comments · Fixed by #118
Closed
4 of 10 tasks

Import the Ponify music archive #100

EventideGlow opened this issue Aug 2, 2016 · 0 comments · Fixed by #118
Labels

Comments

@EventideGlow
Copy link
Contributor

EventideGlow commented Aug 2, 2016

#91 and #9 must be completed before this. Review this task in more detail with Squirrel.

We have the Ponify music archive sitting on our servers now. This task entails writing and running the scripts to import it into Pony.fm's collection, as well as de-duplicating the Pony.fm library.

Part 1: Importing the Ponify library

This component of the task is about bringing the Ponify library into Pony.fm so that tagging/classification work can continue on it. Basic efforts should be made to detect and handle duplicate tracks, as follows, but the bulk of the de-duplication effort falls under Part 2 of this task:

  • Track ID's for duplicate tracks must not change in order to preserve functioning URL's.
  • Hash the audio stream in every file as a means of detecting duplicates.
  • Even if a duplicate copy of a track is ignored, the tags should be parsed out of any Ponify files and used to fill in missing data in Pony.fm's database.
  • If Pony.fm already has the same track from the MLPMA (recorded in the mlpma_tracks table), it should be replaced with the Ponify version (depends on Add a way to replace a track's master audio #9) if we have a higher quality version.
  • If Pony.fm's existing version of the track was directly uploaded by the artist, whichever version is lossless should be preserved (if neither is lossless, then Pony.fm's existing copy should be kept).

Part 2: De-duplicating Pony.fm's library

This process has three goals:

  • find which tracks we have more than one copy of
  • for any tracks with duplicates, find which version has the highest-quality master file
  • combine the Pony.fm, MLPMA, Ponify, and PonyvilleFM archives

Modifying/correcting metadata is outside the scope of this component. That will be handled in parallel in #91. De-duplication of Pony.fm's library will happen as follows:

  • prepare a dump of Pony.fm's master audio files, named using their track ID's
  • combine the Pony.fm dump with PonyvilleFM's archive
  • find duplicate tracks in the combined dump and mark which one is the best-quality version
  • re-upload any track master files for which a higher-quality version was found in the combined dump
  • add any new tracks from the combined dump to Pony.fm
  • process the list of duplicate tracks to "merge" duplicate track records on Pony.fm (set up 301 redirects to the oldest instance of the track)
@EventideGlow EventideGlow added this to the Ponify Import milestone Aug 2, 2016
@EventideGlow EventideGlow assigned ghost Aug 2, 2016
ghost pushed a commit that referenced this issue Feb 2, 2017
ghost pushed a commit that referenced this issue Feb 2, 2017
@ghost ghost added the in progress label Feb 8, 2017
ghost pushed a commit that referenced this issue Feb 21, 2017
ghost pushed a commit that referenced this issue Feb 28, 2017
ghost pushed a commit that referenced this issue Mar 27, 2017
@ghost ghost mentioned this issue May 15, 2017
@ghost ghost closed this as completed in #118 May 15, 2017
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants