GitHub - centre-for-humanities-computing/factfiction_newspapers

Fact from Fiction

This repo accompanies our paper to distinguish feuilleton fiction in Danish newspapers.

📝 In notesyou will find the annotation scheme for the fiction/nonfiction categorization

In scripts you'll find the code, including:

get_features.pyto get MFWs, TF-IDF, and stylistic/syntactic/affective features, the functions of which are defined in scripts/feature_utils.py.
classify.py which employs a random forest model across our 4 different feature sets (MFW100, TF-IDF, selected features, and embeddings)
descriptives.py which visualizes and test differences between the classes of fiction/nonfiction
a clustering_task.py which tests embeddings for clustering feuilleton series (note that these need to be precomputed and are not available here because of size-issues)

Note that the script for creating embeddings (various) is at this anonymized repo

And that the script to benchmark SA models on the Fiction4 corpus is in this anonymized repo

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
figs		figs
logs		logs
notes		notes
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt