omr-dataset

Vision

Inspired by the famous example of MNIST public database (60000 labelled images of hand-written digits), we acknowledge the need for a well-known and representative data set to help the development of applications in the specific domain of Optical Music Recognition.

Purpose

OMR samples for the training and testing of symbol classifiers
Ground-truth material for the evaluation or comparison of OMR engines

Organization

Ultimately, once data structuring and content are sufficiently validated, we think this reference should preferably be hosted by the International Music Score Library Project (IMSLP).

Meanwhile, the purpose of this omr-dataset Github repository is to gather the material used to build preliminary versions of the target reference.

Usage

This project is handled by gradle tool, and can be driven from an IDE or the command line.

[NOTA: Noise addition tools are not yet included in this gradle build]

From command line, for a full rebuild, use:

    gradle clean build

To just display usage rules, use:

    gradle run

this will display:

   Syntax:
      [OPTIONS] -- [INPUT_FILES]
   
   @file:
    Content to be extended in line
   
   Options:
    -clean             : Cleans up output
    -controls          : Generates control images
    -features          : Generates .csv and .dat files
    -help              : Displays general help then stops
    -mistakes          : Saves mistake images
    -model <.zip file> : Defines path to model
    -names             : Prints all possible symbol names
    -nones             : Generates none symbols
    -output <folder>   : Defines output directory
    -subimages         : Generates subimages
    -training          : Trains classifier on features
   
   Input file extensions:
    .xml: annotations file

To clean up output, use:

    gradle run -PcmdLineArgs="-output,data/output,-clean"

To generate features, with all options, using input from data/input-images, use:

    gradle run -PcmdLineArgs="-output,data/output,-features,-nones,-controls,-subimages,--,data/input-images"

To launch training on generated features, while saving mistaken images, and targeting a specific model file, use:

    gradle run -PcmdLineArgs="-output,data/output,-training,-mistakes,-model,data/patch-classifier.zip"

Remark: the training task lasts about 15 minutes when run on the toy example data/input-images folder. To monitor the neural network being trained, simply open a browser on http://localhost:9000 url.

Development

See the related wiki for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data/input-images		data/input-images
src/main		src/main
tools/addNoise		tools/addNoise
.gitignore		.gitignore
.nb-gradle-properties		.nb-gradle-properties
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
settings.gradle		settings.gradle
upload.sh		upload.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omr-dataset

Vision

Purpose

Organization

Usage

Development

About

Releases

Packages

Contributors 3

Languages

License

Audiveris/omr-dataset-tools

Folders and files

Latest commit

History

Repository files navigation

omr-dataset

Vision

Purpose

Organization

Usage

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages