Skip to content

uhh-lt/maverick-coref-de

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

German Maverick Coref

License: CC BY-NC 4.0 Pip Package git

Python Package

The maverick-coref-de Python package provides an easy API to use German Maverick models, enabling efficient and accurate coreference resolution with few lines of code.

Install the library from PyPI

pip install maverick-coref-de

or from source

git clone https://github.com/uhh-lt/maverick-coref-de.git
cd maverick-coref-de
pip install -e .

Loading a Pretrained Model

Maverick models can be loaded using huggingface_id or local path:

from maverick_de import Maverick
model = Maverick(
  hf_name_or_path = "maverick_hf_name" | "maverick_ckpt_path", default = "fynnos/maverick-mes-de10"
  device = "cpu" | "cuda", default = "cuda:0"
)

Inference

Predict

You can use model.predict() to obtain coreference predictions. For a sample input, the model will a dictionary containing:

  • tokens, word tokenized version of the input.
  • clusters_token_offsets, a list of clusters containing mentions' token offsets.
  • clusters_text_mentions, a list of clusters containing mentions in plain text.

Training

Create a Python venv and install from source.

git clone https://github.com/uhh-lt/maverick-coref-de.git
cd maverick-coref-de
pip install -e .
  • Obtain data in .conll format split into train/dev/test
  • Run the minimize.py script from data for the correct dataset
  • Adjust conf/data/<your dataset>.yaml for your dataset
  • Adjust conf/model/mes/<your encoder model>.yaml to
  • Adjust conf/root.yaml to use the your dataset and your encoder model
  • Run CUDA_VISIBLE_DEVICES=X python maverick_de/train.py

Citation

If you use this software, please consider citing our paper published at KONVENS 2025:

@inproceedings{petersenfrey-etal-2025-efficient,
    title = "Efficient and effective coreference resolution for German",
    author = "Petersen-Frey, Fynn and Hatzel, Hans Ole and Biemann, Chris",
    booktitle = "Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025). Volume 1: Long and Short Papers",
    month = "9",
    year = "2025",
    address = "Hildesheim, Germany",
    publisher = "KONVENS 2025 Organizers"
}

The software in this repository is based on the on the work "Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends" by Giuliano Martinelli, Edoardo Barba, and Roberto Navigli published at ACL 2024 main conference. It uses their implementation forked from the original repository with some adaptions to a) make it compatible with German and b) try additional model variants. For English, refer to the original python package.

License

The data and software are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0.

About

efficient and effective coreference resolution for German

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages