Raise-your-Voice

Repository to Svenja Guhr's Dissertation

See also the corpus publication of "theme-d-prose 1848-1920" on Zenodo, DOI: 10.5281/zenodo.12666499.

This repository contains:

Corpus and annotated corpus texts:

Plain text corpus as TXT files (UTF-8)
Plain text corpus as XML files
Manually annotated training data as XML files with sound word annotations
Manually annotated training data as XML files with sound event annotations
Manually annotated test set with 10 XML files with sound event annotations
Manually sound word-annotated XML file of M. v. Ebner-Eschenbach's Die Resel
Manually sound event-annotated XML file of M. v. Ebner-Eschenbach's Die Resel
XML files with automatically sound-annotated corpus texts with sound event annotation
XML files automatically sound event-annotated and semi-automatically enriched with loudness levels

Models:

Trained NEISS NTEE Models for automated sound word and sound event annotation: The models can be regenerated using the indicated training data [https://github.com/SvenjaGuhr/ link]. The models will be uploaded on Zenodo after the publication of my dissertation. For more information see NEISS NTEE's wiki (TEI Entity Enricher) [https://github.com/NEISSproject/tei_entity_enricher].
Model trained on 20 manually sound word-annotated corpus texts
Model trained on 55 manually sound event-annotated corpus texts

Jupyter Notebooks:

Preprocessing of the texts

Preprocessing of plain text files as preparation for the manual sound word annotation in CSV Link
Preprocessing of plain text files as preparation for the automated sound annotation in XML Link

Preprocessing of the corpus metadata

Preprocessing of the corpus for generating metadata (i.a. token and word count) Link
Metadata generation: Corpus Keyword Search and Extraction Link

Evaluation of inter-annotator agreement

Evaluation of inter-annotator agreement with Krippendorff's alpha Krippendorff 2018 and gamma by Mathet et al. (2015) using the python package pygamma-agreement Link

Postprocessing of the annotated corpus texts

One script providing the entire process from automated annotation revision, annotation extraction to automated loudness level labeling via the dictionary approach Link
Postprocessing the sound annotations for extracting the annotations from XML to CSV Link
Postprocessing the sound annotations through automated revision of the XML annotations Link
Loudness annotation with dictionary approach Link

Building the Word2Vec Model

Word2Vec_model.bin
Jupyter Notebook for generating and using the Word2Vec model Link

Dataframes:

Corpus metadata table as CSV

Annotation Guidelines:

Guidelines for sound word and sound event annotation Link

License:

Information on the GPL license

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
Diss_extraction_of_annotations		Diss_extraction_of_annotations
Diss_text_preprocessing		Diss_text_preprocessing
IAA_manual_annotation		IAA_manual_annotation
Loudness_Level_Dictionary		Loudness_Level_Dictionary
Sound_Analysis		Sound_Analysis
Visualizations		Visualizations
Word2Vec_Model		Word2Vec_Model
20240421_output_keywords_in_corpus.csv		20240421_output_keywords_in_corpus.csv
20240430_text_preprocessing_for_XML_preparation.ipynb		20240430_text_preprocessing_for_XML_preparation.ipynb
20240503_theme-d-Prose_Metadaten.csv		20240503_theme-d-Prose_Metadaten.csv
20240509_loudness_level_labeling_tidy_entire_process.ipynb		20240509_loudness_level_labeling_tidy_entire_process.ipynb
Corpus_Keyword_Search_and_Extraction.ipynb		Corpus_Keyword_Search_and_Extraction.ipynb
Guidelines for Sound Annotation.md		Guidelines for Sound Annotation.md
JSON to list in txt.ipynb		JSON to list in txt.ipynb
LICENSE		LICENSE
README.md		README.md
diss_corpus_preprocessing_and_word_count.ipynb		diss_corpus_preprocessing_and_word_count.ipynb
diss_corpus_preprocessing_word_count_sentence_split_lemmatization_Word2Vec_model.ipynb		diss_corpus_preprocessing_word_count_sentence_split_lemmatization_Word2Vec_model.ipynb
preprocess_multiple.py		preprocess_multiple.py
tidy_Text-preprocessing_German_spacy.ipynb		tidy_Text-preprocessing_German_spacy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Raise-your-Voice

Corpus and annotated corpus texts:

Models:

Jupyter Notebooks:

Preprocessing of the texts

Preprocessing of the corpus metadata

Evaluation of inter-annotator agreement

Postprocessing of the annotated corpus texts

Building the Word2Vec Model

Dataframes:

Annotation Guidelines:

License:

About

Uh oh!

Releases

Packages

Languages

License

SvenjaGuhr/Raise-Your-Voice

Folders and files

Latest commit

History

Repository files navigation

Raise-your-Voice

Corpus and annotated corpus texts:

Models:

Jupyter Notebooks:

Preprocessing of the texts

Preprocessing of the corpus metadata

Evaluation of inter-annotator agreement

Postprocessing of the annotated corpus texts

Building the Word2Vec Model

Dataframes:

Annotation Guidelines:

License:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages