Skip to content

6x Training

Hervé Bitteur edited this page Feb 6, 2019 · 2 revisions

6x Training

Important remark: At Audiveris, we have been provided directly with the trained models from ZHAW for page and for patch classifiers, we never had the opportunity to train them on our own.

Therefore, this chapter is essentially a collection of references to ZHAW public information. We however indicate some possible improvements regarding the training data sets.

ZHAW Data Set

DeepScores is the project name for this ZHAW data set.

It is a collection of 300 000 pages of digitally rendered music scores.

This is real music but synthetic images, using 5 different musical fonts. From MuseScore library, MusicXML scores were all fed into Lilypond to produce 3 kinds of artifact:

  • images_png: Page image in .png format
  • pix_annotations_png: Pixel labelling using symbol index as gray value
  • xml_annotations: Collection of symbols tuples (symbol name, bounding box within image)

Main pointers:

Additional Data Sets

DeepScores is a very large data set, but its images are somewhat far from the day-to-day reality of scores to be OMR'ed:

  • All images are synthetic outputs and thus of perfect quality
  • They were all rendered by the same music renderer (Lilypond) resulting in similar layouts.

Although the large DeepScores data set is suitable for a massive "pre-training", it might benefit from a final training on smaller but different data sources.

MUSCIMA++

It is a data set of handwritten music notation for optical music recognition.
A bunch of pointers are in https://muscima.readthedocs.io/en/latest/

Although not part of DeepScores project, this set was used during the training of ZHAW Detection service.

MuseScore

The idea is again to start from MuseScore large library of real music, but this time using MuseScore own music renderer. This is likely to result in layouts somewhat different than Lilypond-rendered DeepScores images.

This approach was sketched between MuseScore and Audiveris during Salsburg 2017 Music Hackday. It is documented on https://github.com/Audiveris/omr-dataset-tools

Thanks to Animesh Tewari who spent his GSoC (Google Summer of Code) 2018 on this task, we now have a first set of 4000 scores. I need to review this material thoroughly before MuseScore launches a larger production.

IMSLP / Audiveris

The idea is to start from IMSLP library at https://imslp.org/ which gathers a huge collection of printed scores, most of them resulting from the scan of engraved music, thus not biased by any music renderer.

The major downside of this collection is of course the lack of related ground truth.

This actually was the first motivation in developing in Audiveris 5.1: With a decent end-user interface to allow an easy validation / correction of OMR output, Audiveris can now be used to gradually populate a real-world training data set.

See the "Annotate Book Symbols" section in Audiveris 5.1 Handbook.

Training the models

Please refer to publication "Deep Watershed Detector for Music Object Recognition" available at https://arxiv.org/abs/1805.10548

The related training code is available at https://github.com/tuggeluk/DeepWatershedDetection

My understanding is that both (page and patch) classifiers were trained on DeepScores data. And both used ResNet101 as an implementation basis.

Page Model

Trained model is at https://drive.google.com/open?id=1P_jWBP9Z0bad1wuzqiVAj3sUR67mSwa0

Patch Model

Trained model is at https://drive.google.com/open?id=1iXr3KGCVgzCGP9CUo1tefis3GFBwCxSQ