Skip to content

SvenjaGuhr/Raise-Your-Voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Raise-your-Voice

Repository to Svenja Guhr's Dissertation

See also the corpus publication of "theme-d-prose 1848-1920" on Zenodo, DOI: 10.5281/zenodo.12666499.

This repository contains:

Corpus and annotated corpus texts:

  • Plain text corpus as TXT files (UTF-8)
  • Plain text corpus as XML files
  • Manually annotated training data as XML files with sound word annotations
  • Manually annotated training data as XML files with sound event annotations
  • Manually annotated test set with 10 XML files with sound event annotations
  • Manually sound word-annotated XML file of M. v. Ebner-Eschenbach's Die Resel
  • Manually sound event-annotated XML file of M. v. Ebner-Eschenbach's Die Resel
  • XML files with automatically sound-annotated corpus texts with sound event annotation
  • XML files automatically sound event-annotated and semi-automatically enriched with loudness levels

Models:

  • Trained NEISS NTEE Models for automated sound word and sound event annotation: The models can be regenerated using the indicated training data [https://github.com/SvenjaGuhr/ link]. The models will be uploaded on Zenodo after the publication of my dissertation. For more information see NEISS NTEE's wiki (TEI Entity Enricher) [https://github.com/NEISSproject/tei_entity_enricher].
  • Model trained on 20 manually sound word-annotated corpus texts
  • Model trained on 55 manually sound event-annotated corpus texts

Jupyter Notebooks:

Preprocessing of the texts

  • Preprocessing of plain text files as preparation for the manual sound word annotation in CSV Link
  • Preprocessing of plain text files as preparation for the automated sound annotation in XML Link

Preprocessing of the corpus metadata

  • Preprocessing of the corpus for generating metadata (i.a. token and word count) Link
  • Metadata generation: Corpus Keyword Search and Extraction Link

Evaluation of inter-annotator agreement

Postprocessing of the annotated corpus texts

  • One script providing the entire process from automated annotation revision, annotation extraction to automated loudness level labeling via the dictionary approach Link
  • Postprocessing the sound annotations for extracting the annotations from XML to CSV Link
  • Postprocessing the sound annotations through automated revision of the XML annotations Link
  • Loudness annotation with dictionary approach Link

Building the Word2Vec Model

  • Word2Vec_model.bin
  • Jupyter Notebook for generating and using the Word2Vec model Link

Dataframes:

  • Corpus metadata table as CSV

Annotation Guidelines:

  • Guidelines for sound word and sound event annotation Link

License:

  • Information on the GPL license

About

Repository to Svenja Guhr's Dissertation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published