Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
Arxiv link of the paper: https://arxiv.org/abs/2105.07148
If any questions, please contact the email: [email protected]
- Python 3.7.0
- Transformer 3.4.0
- Numpy 1.18.5
- Packaging 17.1
- skicit-learn 0.23.2
- torch 1.6.0+cu92
- tqdm 4.50.2
- multiprocess 0.70.10
- tensorflow 2.3.1
- tensorboardX 2.1
- seqeval 1.2.1
CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.
美 B-LOC
国 E-LOC
的 O
华 B-PER
莱 I-PER
士 E-PER
我 O
跟 O
他 O
谈 O
笑 O
风 O
生 O
Chinese BERT: https://huggingface.co/bert-base-chinese/tree/main
Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz
The original download link does not work. We update it as:
Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/tencent-ailab-embedding-zh-d200-v0.2.0.tar.gz
More info refers to: Tencent AI Lab Word Embedding
- Weibo NER
- Ontonote4 NER
- MSRA NER
- Resume NER
- CTB5 POS
- CTB6 POS
- UD1 POS
- UD2 POS
- CTB6 CWS
- MSR CWS
- PKU CWS
- berts
- bert
- config.json
- vocab.txt
- pytorch_model.bin
- bert
- dataset, you can download from here
- NER
- note4
- msra
- resume
- POS
- ctb5
- ctb6
- ud1
- ud2
- CWS
- ctb6
- msr
- pku
- NER
- vocab
- tencent_vocab.txt, the vocab of pre-trained word embedding table, downlaod from here.
- embedding
- word_embedding.txt
- result
- NER
- note4
- msra
- resume
- POS
- ctb5
- ctb6
- ud1
- ud2
- CWS
- ctb6
- msr
- pku
- NER
- log
-
1.Convert .char.bmes file to .json file,
python3 to_json.py
-
2.run the shell,
sh run_demo.sh
My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.
-
Enter the source code director of Transformer,
cd source/transformers-master
-
Find the modeling_util.py, and positioned to about 995 lines
-
Compile the revised source code and install.
python3 setup.py install
@inproceedings{liu-etal-2021-lexicon,
title = "Lexicon Enhanced {C}hinese Sequence Labeling Using {BERT} Adapter",
author = "Liu, Wei and
Fu, Xiyan and
Zhang, Yue and
Xiao, Wenming",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.454",
doi = "10.18653/v1/2021.acl-long.454",
pages = "5847--5858"
}