word-associations

Word associations in Indre Mission and Kirkeligt Samfund.

Usage

Folder structure:

- dataset/
    - data/
        - pr983_204.txt
        ...
    - Stopord.txt
    - metadata_nordveck.csv

Install requirements:

pip install -r requirements.txt

Preprocessing

Preprocess the corpus (lemmatization, stop word removal, normalization):

python3 src/clean_texts.py

This will output the cleaned corpus as a csv file with id, text and clean_text columns.

- dataset/
    - clean_data.csv

Word count

You can use the src/word_count.py run CLI to extraxct the most common words.

Collect collocations

You can use the src/cooccurrences.py run CLI, to extract the highest scoring collocations of a target word based on PMI.

Arguments

Argument	Description	Type	Default
`seed_word`	Seed word to start off from.	str	-
`-h`, `--help`	Show help message and exit.
`--group_by GROUP_BY`, `-g GROUP_BY`	Metadata column to group results by.	str	None
`--out_file OUT_FILE`, `-o OUT_FILE`	JSON file to output results to.	str	results/coocurrences.json
`--top_k TOP_K`, `-k TOP_K`	Top K ranking cooccurring words to output.	int	50
`--n_context N_CONTEXT`, `-n N_CONTEXT`	Number of context words to consider in each direction.	int	5

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

word-associations

Usage

Preprocessing

Word count

Collect collocations

Arguments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

centre-for-humanities-computing/word-associations

Folders and files

Latest commit

History

Repository files navigation

word-associations

Usage

Preprocessing

Word count

Collect collocations

Arguments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages