"Hauptstimme" and the "OpenScore Orchestra"

This repository contains two interconnected datasets along with code for processing that data.

OpenScore Orchestra

The OpenScore Orchestra Corpus consists of c.100 transcribed orchestral movements:

Composer	Large scale work(s)	Movements
Bach, Johann Sebastian	B Minor Mass, BWV.232	27
Bach, Johann Sebastian	Brandenburg Concerto No.3, BWV.1048	3
Bach, Johann Sebastian	Brandenburg Concerto No.4, BWV.1049	3
Beach, Amy	Symphony in E minor (Gaelic), Op.32	4
Beethoven, Ludwig van	Complete Symphonies (1–9)	37
Boulanger, Lili	D'un matin de printemps	1
Brahms, Johannes	Ein Deutsches Requiem, Op.45	1 (from 7)
Brahms, Johannes	Complete Symphonies (1–4)	16
Bruckner, Anton	Symphony No.5, WAB.105	4/5

Movement numbering is complex in the following cases:

Bach: B Minor Mass. The movements are numbered according to NBAII (1–23) and are split by movement where possible (e.g., 7a from 7b), but not in the case of dovetail (e.g., 4a and 4b are one with double bar line and editorial tempo marking).
Bruckner: Symphony No.5, WAB.105. We split the 3rd movement into two files.

We explain the stylistic design criteria for the scores here.

Hauptstimme

When listening to music, our attention is drawn back and forth between different elements. Often this is guided by following the main, most prominent melodic line: the Hauptstimme.

The Hauptstimme annotations provide human analysis annotations for where they think the "main theme" is in each of the above works. Please see this explanation for more details on the annotation method and FAQs.

Here is an example of what the annotated scores look like. This is a famous melody (the start of the main theme in Beethoven's 5th) that's distributed among several parts.

Data Summary

The datasets include the following files for each orchestral work in the format <identifier> plus:

.mscz: The annotated MuseScore file.
.mxl: A conversion of the .mscz file.
.mm.json: The compressed 'measure map' – a lightweight representation of the bar information to enable alignment with other corpora.
.csv: A 'lightweight' .csv file extracted from the full score (with repeats expanded), indicating the highest pitch being played by each instrument part at every timestamp in which a change occurs in the score.
_alignment.csv: An alignment table containing timestamps for each score note onset in a set of public domain / open licence audio recordings obtained from the International Music Score Library Project (IMSLP). (These files are only included for scores with available audio recordings.)
_annotations.csv: Information about each annotation including the qstamp, theme label, and instrument.
_melody.mxl: The annotated melody segments stitched together to form a single-stave 'melody score'.
_part_relations.csv: A derived analysis of the interplay between the score parts in each Hauptstimme annotation block.

The filename structure is as follows:

data/<composer>/<set>/<score>/<files>

<composer> is the composer's name in the form <Last,_First_Second>.
<set> is an identifier for the work.
<score> is the movement number for multi-movement works. For single-movement works, the \verb<score> level is omitted.

Code Summary

We provide the following scripts, located in scripts/:

main.py: Take a score's MuseScore file and produce the rest of the files specified above (except the alignment table).
build_corpus.py: Produce all corpus files from each score's MuseScore file.
get_part_relations.py: Take a score's MusicXML file and produce a part relationships summary.
compare_segmentations.py: Take a score's MusicXML file and perform a comparison of the Hauptstimme annotation points to three different sets of automatic segmentation points (novelty-based (tempogram features), novelty-based (chromagram features), and changepoint detection-based).
align_score_audios.py: Take a score's MuseScore/MusicXML file and a set of audio files, then align the audio files to the score, producing an alignment table.

See each script's docstring for how it can be run in the command line.

We also provide Jupyter notebooks, located in notebooks/:

demo.ipynb: A demonstration of how the functions in src can be used directly.

Development was done in Python 3.11.

Requirements

Please run pip install -r requirements.txt --no-deps to install the Python dependencies.

--no-deps is required due to a clash in the dependencies: synctoolbox and libfmp require music21<6.0.0,>=5.7.0, but pyMeasureMap requires a much newer version. However, having music21 9.1.0 caused no issues with the functionality used from synctoolbox and libfmp.

Additional dependencies include:

MuseScore 4
If you want to align your own scores and audio files: the libsndfile C library (required by the soundfile Python package). After installing it you may need to run:

  pip uninstall soundfile
  pip install soundfile

Acknowledgements

Many thanks to:

Deutsche Telekom for funding part of this work in the context of the 'Beethoven X' project.
Fellow 'Beethoven X' project team members for discussions.
Annotators:
- On the 'Beethoven X' project, including Nicolai Böhlefeld and many others.
- At Cornell, Eastman, TU Dortmund, Durham, and elsewhere.
Transcribers, both:
- in our immediate team, and
- more widely across the MuseScore community, members who made transcriptions freely available under the CCO licence and named their source edition.

Licence

Scores: CC0 1.0 Universal
Annotations: CC-By-SA
Code: MIT

All scores have been copied from clearly identified and unequivocally public source editions on IMSLP. Transcribers have committed to making these transcriptions using that public source edition, and working from scratch. We have confidence in our team and their work but obviously cannot make any guarantees. If you see anything that we ought to review, please let us know.

Citation

To follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

"Hauptstimme" and the "OpenScore Orchestra"

OpenScore Orchestra

Hauptstimme

Data Summary

Code Summary

Requirements

Acknowledgements

Licence

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
data		data
docs		docs
notebooks		notebooks
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

MarkGotham/Hauptstimme

Folders and files

Latest commit

History

Repository files navigation

"Hauptstimme" and the "OpenScore Orchestra"

OpenScore Orchestra

Hauptstimme

Data Summary

Code Summary

Requirements

Acknowledgements

Licence

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 3

Languages

Packages