Skip to content

Latest commit

 

History

History
34 lines (22 loc) · 1.36 KB

README.md

File metadata and controls

34 lines (22 loc) · 1.36 KB

README

This repo bundles experimental code for the EventDNA LRE paper.

Usage

The code has been verified to work only on Ubuntu.

Data

For information on obtaining the corpus, see https://github.com/NewsDNA-LT3/.github.

The dataset should be extracted to ./extracted. This should result in a dir with one dir per EventDNA document, such that each document dir has the structure:

eventdna_1/
    dnaf.json
    lets.csv
    alpino/
        1.xml
        2.xml
        3.xml
        4.xml

dnaf.json is annotated data in a custom format. lets.csv contains syntactic information derived from the LETS analysis tool. The alpino folder contains parse trees for each sentence in the document, produced by the Alpino parser for dutch.

Training and evaluation

main.py is a CLI file. The command has two main options: train and eval.

python main.py train will train CRF models in crossvalidation fashion. The resulting models as well as the dev datasets are saved to a timestamped dir under ./output. Training parameters can be changed on the CLI, and there is a --test option to run a limited training routine quickly. Use --help to check the options.

To evaluate the resulting models, use python main.py eval <timestamped_output_dir>. A directory called eval will be created in the same dir, containing PRF evaluations per fold and averaged.