Code for our NAACL-2022 paper DEGREE: A Data-Efficient Generation-Based Event Extraction Model.
- Python==3.8
- PyTorch==1.8.0
- transformers==3.1.0
- protobuf==3.17.3
- tensorboardx==2.4
- lxml==4.6.3
- beautifulsoup4==4.9.3
- bs4==0.0.1
- stanza==1.2
- sentencepiece==0.1.95
- ipdb==0.13.9
Note:
-
If you meet issues reated to rust when installing transformers through pip, this website might be helpful
-
Or you can reference the
env_reference.yml
for clearer installation
We support ace05e
, ace05ep
, and ere
.
Our preprocessing mainly adapts OneIE's released scripts with minor modifications. We deeply thank the contribution from the authors of the paper.
- Prepare data processed from DyGIE++
- Put the processed data into the folder
processed_data/ace05e_dygieppformat
- Run
./scripts/process_ace05e.sh
- Download ACE data from LDC
- Run
./scripts/process_ace05ep.sh
- Download ERE English data from LDC, specifically, "LDC2015E29_DEFT_Rich_ERE_English_Training_Annotation_V2", "LDC2015E68_DEFT_Rich_ERE_English_Training_Annotation_R2_V2", "LDC2015E78_DEFT_Rich_ERE_Chinese_and_English_Parallel_Annotation_V2"
- Collect all these data under a directory with such setup:
ERE
├── LDC2015E29_DEFT_Rich_ERE_English_Training_Annotation_V2
│ ├── data
│ ├── docs
│ └── ...
├── LDC2015E68_DEFT_Rich_ERE_English_Training_Annotation_R2_V2
│ ├── data
│ ├── docs
│ └── ...
└── LDC2015E78_DEFT_Rich_ERE_Chinese_and_English_Parallel_Annotation_V2
├── data
├── docs
└── ...
- Run
./scripts/process_ere.sh
The above scripts will generate processed data (including the full training set and the low-resourece sets) in ./process_data
.
Run ./scripts/train_degree_e2e.sh
or use the following commands:
Generate data for DEGREE (End2end)
python degree/generate_data_degree_e2e.py -c config/config_degree_e2e_ace05e.json
Train DEGREE (End2end)
python degree/train_degree_e2e.py -c config/config_degree_e2e_ace05e.json
The model will be stored at ./output/degree_e2e_ace05e/[timestamp]/best_model.mdl
in default.
Run ./scripts/train_degree_ed.sh
or use the following commands:
Generate data for DEGREE (ED)
python degree/generate_data_degree_ed.py -c config/config_degree_ed_ace05e.json
Train DEGREE (ED)
python degree/train_degree_ed.py -c config/config_degree_ed_ace05e.json
The model will be stored at ./output/degree_ed_ace05e/[timestamp]/best_model.mdl
in default.
Run ./scripts/train_degree_eae.sh
or use the following commands:
Generate data for DEGREE (EAE)
python degree/generate_data_degree_eae.py -c config/config_degree_eae_ace05e.json
Train DEGREE (EAE)
python degree/train_degree_eae.py -c config/config_degree_eae_ace05e.json
The model will be stored at ./output/degree_eae_ace05e/[timestamp]/best_model.mdl
in default.
Evaluate DEGREE (End2end) on Event Extraction task
python degree/eval_end2endEE.py -c config/config_degree_e2e_ace05e.json -e [e2e_model]
Evaluate DEGREE (Pipe) on Event Extraction task
python degree/eval_pipelineEE.py -ced config/config_degree_ed_ace05e.json -ceae config/config_degree_eae_ace05e.json -ed [ed_model] -eae [eae_model]
Evaluate DEGREE (EAE) on Event Argument Extraction task (given gold triggers)
python degree/eval_pipelineEE.py -ceae config/config_degree_eae_ace05e.json -eae [eae_model] -g
Dataset | Model | Model | Model |
---|---|---|---|
ace05e | DEGREE (EAE) | DEGREE (ED) | DEGREE (E2E) |
ace05ep | DEGREE (EAE) | DEGREE (ED) | DEGREE (E2E) |
ere | DEGREE (EAE) | DEGREE (ED) | DEGREE (E2E) |
If you find that the code is useful in your research, please consider citing our paper.
@inproceedings{naacl2022degree,
author = {I-Hung Hsu and Kuan-Hao Huang and Elizabeth Boschee and Scott Miller and Prem Natarajan and Kai-Wei Chang and Nanyun Peng},
title = {DEGREE: A Data-Efficient Generative Event Extraction Model},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
year = {2022},
}
If you have any issue, please contact I-Hung Hsu at ([email protected]) or Kuan-Hao Huang at ([email protected]).