-
Notifications
You must be signed in to change notification settings - Fork 232
6x Training
Important remark: At Audiveris, we have been provided directly with the trained models from ZHAW for page and for patch classifiers, we never had the opportunity to train them on our own.
Therefore, this chapter is essentially a collection of references to ZHAW public information. We however indicate some possible improvements regarding the training data sets.
DeepScores is the project name for this ZHAW data set.
It is a collection of 300 000 pages of digitally rendered music scores.
This is real music but synthetic images, using 5 different musical fonts. From MuseScore library, MusicXML scores were all fed into Lilypond to produce 3 kinds of artifact:
-
images_png
: Page image in.png
format -
pix_annotations_png
: Pixel labelling using symbol index as gray value -
xml_annotations
: Collection of symbols tuples (symbol name, bounding box within image)
Main pointers:
- Technical report is: "DeepScores - A Dataset for Segmentation, Detection and Classification of Tiny Objects"
Latest version at https://arxiv.org/pdf/1804.00525.pdf - GitHub (initial?) umbrella page: https://tuggeluk.github.io/deepscores/
- Full data set (beware of the 69 GB to download) is in DeepScoresArchives
at https://drive.google.com/drive/folders/1KFxqi0rO-bJrd03rLk87fF1iOmnjpaoG - GitHub repository for related code: https://github.com/tuggeluk/DeepScoresExamples
- Evolution axes were recently presented in article:
"DeepScores and Deep Watershed Detection current state and open issues"
https://www.groundai.com/project/deepscores-and-deep-watershed-detection-current-state-and-open-issues/
DeepScores is a very large data set, but its images are somewhat far from the day-to-day reality of scores to be OMR'ed:
- All images are synthetic outputs and thus of perfect quality
- They were all rendered by the same music renderer (Lilypond) resulting in similar layouts.
Although the large DeepScores data set is suitable for a massive "pre-training", it might benefit from a final training on smaller but different data sources.
It is a data set of handwritten music notation for optical music recognition.
A bunch of pointers are in https://muscima.readthedocs.io/en/latest/
Although not part of DeepScores project, this set was used during the training of ZHAW Detection service.
The idea is again to start from MuseScore large library of real music, but this time using MuseScore own music renderer. This is likely to result in layouts somewhat different than Lilypond-rendered DeepScores images.
This approach was sketched between MuseScore and Audiveris during Salsburg 2017 Music Hackday. It is documented on https://github.com/Audiveris/omr-dataset-tools
Thanks to Animesh Tewari who spent his GSoC (Google Summer of Code) 2018 on this task, we now have a first set of 4000 scores. I need to review this material thoroughly before MuseScore launches a larger production.
The idea is to start from IMSLP library at https://imslp.org/ which gathers a huge collection of printed scores, most of them resulting from the scan of engraved music, thus not biased by any music renderer.
The major downside of this collection is of course the lack of related ground truth.
This actually was the first motivation in developing in Audiveris 5.1: With a decent end-user interface to allow an easy validation / correction of OMR output, Audiveris can now be used to gradually populate a real-world training data set.
See the "Annotate Book Symbols" section in Audiveris 5.1 Handbook.
Please refer to publication "Deep Watershed Detector for Music Object Recognition" available at https://arxiv.org/abs/1805.10548
The related training code is available at https://github.com/tuggeluk/DeepWatershedDetection
My understanding is that both (page and patch) classifiers were trained on DeepScores data. And both used ResNet101 as an implementation basis.
Trained model is at https://drive.google.com/open?id=1P_jWBP9Z0bad1wuzqiVAj3sUR67mSwa0
Trained model is at https://drive.google.com/open?id=1iXr3KGCVgzCGP9CUo1tefis3GFBwCxSQ
Software licensed under the GNU Affero General Public License (AGPL) Version 3
© 2000-2023 Audiveris. Logo designed by Katka.