-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove PDFs #74
Remove PDFs #74
Conversation
It might be useful to pull out the PDFs into a directory outside of the MEI version directories, because the PDFs are not specific to any given MEI version and applied to all MEI versions. Also when studying an encoding in MEI 4, the user will still need to refer to the PDF, and they will be confused if it is only in MEI 3 folder. Complications in this idea would be if the MEI encodings refer to different PDFs when changing MEI versions. A solution would be to add a note to the header of the encoding that specifies the URL of the PDF that is being referenced for graphical music. |
Also note that when you delete the PDFs from MEI3, they are still in the repositories history (so no reduction in the size of the repository other than preventing additional copies of the PDFs in later version directories of the complete examples). |
Git recognizes if the files are identical and will only store them once in the repo, if I'm not totally mistaken. They will be copied to the working tree multiple times, though. I'd say there is no harm in leaving the duplicates in there. On the contrary, it helps greatly when working with the samples. |
I'm +1 for having the files that stood model for the encoding nearby/available. Probably moving them to a separate directory as proposed by @craigsapp and consequently deduplicating them would be a sound solution. Ideally, this should be accompanied by referencing them from the encoding. Maybe just a simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As emerged from the discussion, moving the legacy folder PDFs to a separate folder and referencing them from the files would be really good.
In most of the files a URL to the original source is present. IMSLP links need to be updated, but we could then delete those PDFs here completely. Anyway, they would stay in the git history. |
A personal plea: Please do not delete. When working with the samples, it is extremely helpful being able to have the files close to each other to quickly check files. I now can quickly open the MEI and PDF files side by side from the file manager. What takes a split second now may then take minutes because the link has to be identified in the MEI and the file would have to be downloaded. And links may break in the future. The repo size will not shrink anyway when deleting the PDFs as they stay in the history. |
Why are they in a folder called "legacy"? Can't they just be in a folder called "PDF"? |
It seems virtually impossible to make people happy with this. |
I think it was fine — they weren’t being deleted, and they were being consolidated. I was only asking because legacy implies that they are no longer used, but as many people pointed out they liked having the original next to the encoding. So a folder called “pdf” would be clear and also mark that they were still useful and correspond to the encodings. |
Community starts with a C like compromise ;) Sorry for being late, but I am still trying to understand the initial intention of the removal. Yes, some of the PDFs are duplicated, but as @th-we pointed out, there can be some reasons to have the PDF documents directly with the encodings. On the other hand, moving all the PDFs to a separate folder (and so deduplicating them) does not provide any improvement in repo size. Quite the opposite: Although Git, as mentioned above, stores identical files only once in the git tree, it will move the internal references of the git tree by creating a new snapshot of the tree in the .git/objects folder (plus another entry for the commit itself). This is quite nicely explained here: https://how-to.dev/how-git-stores-data As you can see in the screenshots from a clean test repo below, the duplicated identical test.txt file is referenced by the same hash Saying this, I think the usability of the repo structure is more important than Git's whatever internal structuring and organization, so I'm not at all against moving or changing it in principle. I'm just trying to see the added value. |
In case you are not aware: you should use |
Also, I fail to see the point of moving PDFs of the source editions into any thing such as a What is the purpose of the sample encodings of music? The example encodings are based on the graphical music in these files, so disassociating them, in particular by placing them into any such legacy folder is highly counterproductive. It is also somewhat counterproductive to be moving older encodings into a legacy folder since their URLs then change. I would propose this structure: full-music-encoding-examples folder:
The pdf directory contains the PDFs for the sample encodings. The mei directory contains the MEI files for the most recent version of MEI The legacy directory contains the MEI files for previous versions of MEI, such as: legacy folder:
If people want to see the source edition for a legacy encoding, then they would look in the pdf directory of the parent directory. When a new version of MEI comes out and there is a translation script to update MEI files from the previous version, then the files in mei would be moved to It is not sufficient to say "look at the rendering of the file in verovio" because many of the files will not render in verovio (unless that has been fixed in the files). Also the original encodings had lots of errors (in particular related to tuplets), and the original encodings were not checked by output to any resulting notation in any renderer (such as verovio which was not available when the encodings were created). |
|
OK. I checked with a test repo and it is behaving as you describe (only checking mv). Presumably git is storing content based on checksums. And with my test, it is also recognizing the system Click to view test
One problem with
And you have to run
If you use
In this case git manages the move rather than the operating system, so it is aware of what is happening rather than deducing it later. |
I'm sorry for starting to derail this thread with the |
It is useful info to know, though :-) |
@craigsapp proposal looks good to me |
This PR removes the duplicated PDFs from the MEI3 and MEI4 collections, leaving those in the legacy folder.
closes #69