A Python package that uses the Brill Tagging algorithm for part-of-speech tagging, available for several languages. It utilizes the NLTK library for tokenization and tagging.
Models have been trained with UniversalDependencies datasets
To install the package, you can use pip:
pip install brill_postaggerTo use the Brill Postagger, first download the corresponding pre-trained model, then use it to tag sentences in various languages.
Example usage:
from brill_postagger import BrillPostagger
# Initialize the tagger for Portuguese (pt)
tagger = BrillPostagger.from_pretrained("pt")
# Tag a sentence
result = tagger.tag("como está o tempo lá fora?")
print(result)The following languages are supported, each corresponding to a pre-trained model:
- Catalan (
ca) - Danish (
da) - German (
de) - English (
en) - Spanish (
es) - Basque (
eu) - French (
fr) - Galician (
gl) - Italian (
it) - Dutch (
nl) - Portuguese (
pt)
If you'd like to contribute to the project, please feel free to submit issues or pull requests. Contributions are always welcome!
This project is licensed under the MIT License - see the LICENSE file for details.