This repo contains the notebooks used for sourcing data for A Corpus of Biblical Names in the Greek New Testament to Study the Additions, Omissions, and Variations across Different Manuscripts, which was submitted to SemDH 2024: First International Workshop of Semantic Digital Humanities.
.devcontainer/ Directory containing devcontainer Dockerfile and config file
data/ General directory for downloaded and generated data
|-- publish/ Directory of cleaned up lists (will be generated by 05_pub_prep.ipynb)
|-- tables/ Directory containing manually curated lists
| `-- names.csv List of manually curated names
|-- parsed/ Parsing data of TEI transcription files
`-- tmp/ Intermediate and temporary files
notebooks/ Directory of jupyter notebooks
nt-manuscripts/ Directory of python scripts used for downloading manuscript metadata from NTVMR
nt-transcripts/ Directory of python scripts used for downloading transcription files from IGNTP and NTVMR
na28-crawler/ Directory containing a crawler to get all NA28 verses (no annotaions).
ecm-crawler/ Directory containing a crawler to get all ECM verses (no annotaions).
.python-version Python version indicator
README This README
requirements.txt Requirements for Python environment
run_notebooks.sh Script to run notebooks by selecting tasks
The recommended Python version for this repo is 3.12.1 (see .python-version). Dockerimages with Python preinstalled can be found on Dockerhub. Alternatively you can setup and run a virtual Python environment. We also provide a devcontainer in this repository.
In your Python environment run pip install -r requirements.txt from the projects root directory to install Jupyter. This will enable you to run the notebooks. When using the devcontainer this is not needed.
For ease of use, run run_notebooks.sh from the projects root directory. During the initial run you will be required to select all steps (one to eight). This will always take multiple hours.
We have utilized a SPARQL query for retrieving an initial list of biblical names in the New Testament.
Endpoint: https://database.factgrid.de/query
SELECT ?Person ?PersonLabel ?noted ?notedLabel ?GenderLabel ?link ?book
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?Person wdt:P2 wd:Q8811.
?Person wdt:P143 ?noted.
?noted wdt:P8 ?book.
FILTER (?book IN (wd:Q74942, wd:Q74943, wd:Q74944, wd:Q74945, wd:Q74946, wd:Q74947, wd:Q74948, wd:Q74949, wd:Q74950, wd:Q74951, wd:Q74952, wd:Q74953, wd:Q74954, wd:Q74955, wd:Q74956, wd:Q74957, wd:Q74958, wd:Q74959, wd:Q74960, wd:Q74961, wd:Q74962, wd:Q74963, wd:Q74964, wd:Q74965, wd:Q74966, wd:Q74967, wd:Q74968))
OPTIONAL { ?Person wdt:P154 ?Gender. }
OPTIONAL { ?link schema:about ?Person ; schema:isPartOf <https://www.wikidata.org/> . }
}
ORDER BY (?PersonLabel)There will be/have been updates on this repo. Please have a look at the release tags for previous versions.
If you use this code or data in your research, please cite:
@inproceedings{Werner2024,
title = {A Corpus of Biblical Names in the Greek New Testament to Study the Additions, Omissions, and Variations across Different Manuscripts},
author = {Christoph Werner and Zacharias Shoukry and Soham Al-Suadi and Frank Krüger},
url = {https://ceur-ws.org/Vol-3724/paper6.pdf},
crossref = {SemDH2024},
year = {2024},
abstract = {The analysis of textual variants of verses in the New Testament across different manuscripts has mainly been done by close reading with manual effort. With the increasing number of transcriptions of the different manuscripts, quantitative analyses (so-called distant reading) can be used to search for patterns of omission, addition, or other variations, to formulate novel hypotheses to be investigated by close reading. In this work, we present a corpus of biblical names including spelling variation and inflections and their mentions in the transcriptions of the New Testament. By integrating and semantically enriching the data collected from different sources, we established a corpus that can be used for the quantitative study of omission, addition, and variation of such biblical names. To illustrate the corpus, we implement some use cases and show that well-known cases can be quantitatively reproduced. The corpus and all code are published under open licenses to enable reproduction, update, and maintenance.},
keywords = {New Testament,Biblical Names,Textual Variation Units},
}
@proceedings{SemDH2024,
booktitle = {Semantic Digital Humanities 2024},
year = {2024},
editor = {Oleksandra Bruns and Andrea Poltronieri and Lise Stork and Tabea Tietz},
series = {CEUR Workshop Proceedings},
address = {Aachen},
issn = {1613-0073},
url = {https://ceur-ws.org/Vol-3724/},
venue = {Hersonissos, Greece},
eventdate = {2024-05-27},
title = {Proceedings of the First International Workshop of Semantic Digital Humanities (SemDH 2024)}
}- Version v1 from Mar 15, 2024
- Version v2 from May 17, 2024
- Version v3 from Jul 10, 2024
- Version v4 from Jul 02, 2025