Repository to Svenja Guhr's Dissertation
See also the corpus publication of "theme-d-prose 1848-1920" on Zenodo, DOI: 10.5281/zenodo.12666499.
This repository contains:
- Plain text corpus as TXT files (UTF-8)
- Plain text corpus as XML files
- Manually annotated training data as XML files with sound word annotations
- Manually annotated training data as XML files with sound event annotations
- Manually annotated test set with 10 XML files with sound event annotations
- Manually sound word-annotated XML file of M. v. Ebner-Eschenbach's Die Resel
- Manually sound event-annotated XML file of M. v. Ebner-Eschenbach's Die Resel
- XML files with automatically sound-annotated corpus texts with sound event annotation
- XML files automatically sound event-annotated and semi-automatically enriched with loudness levels
- Trained NEISS NTEE Models for automated sound word and sound event annotation: The models can be regenerated using the indicated training data [https://github.com/SvenjaGuhr/ link]. The models will be uploaded on Zenodo after the publication of my dissertation. For more information see NEISS NTEE's wiki (TEI Entity Enricher) [https://github.com/NEISSproject/tei_entity_enricher].
- Model trained on 20 manually sound word-annotated corpus texts
- Model trained on 55 manually sound event-annotated corpus texts
- Preprocessing of plain text files as preparation for the manual sound word annotation in CSV Link
- Preprocessing of plain text files as preparation for the automated sound annotation in XML Link
- Preprocessing of the corpus for generating metadata (i.a. token and word count) Link
- Metadata generation: Corpus Keyword Search and Extraction Link
- Evaluation of inter-annotator agreement with Krippendorff's alpha Krippendorff 2018 and gamma by Mathet et al. (2015) using the python package pygamma-agreement Link
- One script providing the entire process from automated annotation revision, annotation extraction to automated loudness level labeling via the dictionary approach Link
- Postprocessing the sound annotations for extracting the annotations from XML to CSV Link
- Postprocessing the sound annotations through automated revision of the XML annotations Link
- Loudness annotation with dictionary approach Link
- Word2Vec_model.bin
- Jupyter Notebook for generating and using the Word2Vec model Link
- Corpus metadata table as CSV
- Guidelines for sound word and sound event annotation Link
- Information on the GPL license