Code for the NAACL2022 (Findings) paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction".
The overall architecture of our hierarchical modality fusion network.To run the codes, you need to install the requirements:
pip install -r requirements.txt
To extract visual object images, we first use the NLTK parser to extract noun phrases from the text and apply the visual grouding toolkit to detect objects. Detailed steps are as follows:
- Using the NLTK parser (or Spacy, textblob) to extract noun phrases from the text.
- Applying the visual grouding toolkit to detect objects. Taking the twitter2015 dataset as an example, the extracted objects are stored in
twitter2015_aux_images
. The images of the object obey the following naming format:imgname_pred_yolo_crop_num.png
, whereimgname
is the name of the raw image corresponding to the object,num
is the number of the object predicted by the toolkit. (Note that intrain/val/test.txt
, text and raw image have a one-to-one relationship, so theimgname
can be used as a unique identifier for the raw images) - Establishing the correspondence between the raw images and the objects. We construct a dictionary to record the correspondence between the raw images and the objects. Taking
twitter2015/twitter2015_train_dict.pth
as an example, the format of the dictionary can be seen as follows:{imgname:['imgname_pred_yolo_crop_num0.png', 'imgname_pred_yolo_crop_num1.png', ...] }
, where key is the name of raw images, value is a List of the objects.
The detected objects and the dictionary of the correspondence between the raw images and the objects are available in our data links.
-
Twitter2015 & Twitter2017
The text data follows the conll format. You can download the Twitter2015 data via this link and download the Twitter2017 data via this link. Please place them in
data/NER_data
.You can also put them anywhere and modify the path configuration in
run.py
-
MNRE
The MNRE dataset comes from MEGA, many thanks.
You can download the MRE dataset with detected visual objects from Google Drive or use the following commands:
cd data wget 120.27.214.45/Data/re/multimodal/data.tar.gz tar -xzvf data.tar.gz mv data RE_data
The expected structure of files is:
HMNeT
|-- data
| |-- NER_data
| | |-- twitter2015 # text data
| | | |-- train.txt
| | | |-- valid.txt
| | | |-- test.txt
| | | |-- twitter2015_train_dict.pth # {imgname: [object-image]}
| | | |-- ...
| | |-- twitter2015_images # raw image data
| | |-- twitter2015_aux_images # object image data
| | |-- twitter2017
| | |-- twitter2017_images
| | |-- twitter2017_aux_images
| |-- RE_data
| | |-- img_org # raw image data
| | |-- img_vg # object image data
| | |-- txt # text data
| | |-- ours_rel2id.json # relation data
|-- models # models
| |-- bert_model.py
| |-- modeling_bert.py
|-- modules
| |-- metrics.py # metric
| |-- train.py # trainer
|-- processor
| |-- dataset.py # processor, dataset
|-- logs # code logs
|-- run.py # main
|-- run_ner_task.sh
|-- run_re_task.sh
The data path and GPU related configuration are in the run.py
. To train ner model, run this script.
bash run_twitter15.sh
bash run_twitter17.sh
To train re model, run this script.
bash run_re_task.sh
To test ner model, you can use the tained model and set load_path
to the model path, then run following script:
python -u run.py \
--dataset_name="twitter15/twitter17" \
--bert_name="bert-base-uncased" \
--seed=1234 \
--only_test \
--max_seq=80 \
--use_prompt \
--prompt_len=4 \
--sample_ratio=1.0 \
--load_path='your_ner_ckpt_path'
To test re model, you can use the tained model and set load_path
to the model path, then run following script:
python -u run.py \
--dataset_name="MRE" \
--bert_name="bert-base-uncased" \
--seed=1234 \
--only_test \
--max_seq=80 \
--use_prompt \
--prompt_len=4 \
--sample_ratio=1.0 \
--load_path='your_re_ckpt_path'
The acquisition of Twitter15 and Twitter17 data refer to the code from UMT, many thanks.
The acquisition of MNRE data for multimodal relation extraction task refer to the code from MEGA, many thanks.
If you use or extend our work, please cite the paper as follows:
@inproceedings{DBLP:conf/naacl/ChenZLYDTHSC22,
author = {Xiang Chen and
Ningyu Zhang and
Lei Li and
Yunzhi Yao and
Shumin Deng and
Chuanqi Tan and
Fei Huang and
Luo Si and
Huajun Chen},
editor = {Marine Carpuat and
Marie{-}Catherine de Marneffe and
Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
title = {Good Visual Guidance Make {A} Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction},
booktitle = {Findings of the Association for Computational Linguistics: {NAACL}
2022, Seattle, WA, United States, July 10-15, 2022},
pages = {1607--1618},
publisher = {Association for Computational Linguistics},
year = {2022},
url = {https://doi.org/10.18653/v1/2022.findings-naacl.121},
doi = {10.18653/v1/2022.findings-naacl.121},
timestamp = {Tue, 23 Aug 2022 08:36:33 +0200},
biburl = {https://dblp.org/rec/conf/naacl/ChenZLYDTHSC22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}