Skip to content

Latest commit

 

History

History
66 lines (63 loc) · 5.01 KB

README.md

File metadata and controls

66 lines (63 loc) · 5.01 KB

Icelandic NLP resources

This is an list of known tools and resources developed specifically to do linguistic processing in Icelandic. It is intended to give readers a clear overview of the ever-growing arsenal of tools for working with Icelandic natural language data at a glance.

This list is categorized by task to increase clarity. Due to that, some multi-functional tools and toolkits might appear more than once in the list. If you notice a category or resource is missing or have suggestions on how to improve this list, please open a pull request.

Contents

Notable papers and reports

Other resource collections

  • CLARIN-IS
    • The Icelandic branch of the CLARIN-ERIC language resource initiative. Contains information on and downloads for many tools and datasets.
  • malfong.is
    • List of language technology resources, maintained by Árnastofnun.

Toolkits

  • Java toolkit which does tokenization, POS tagging, lemmatization, parsing and NER
  • Developed by Hrafn Loftsson
  • TTS frontend designed to work with the Merlin speech synthesis system developed by CSTR
  • It contains a pronunciation dictionary, sequitur g2p model, stress analysis component and more. Unfortunately it does not include any documentation.
    • Developed by Anna Björk Nikulásdóttir at LVL

Tokenization and text normalization

POS tagging

Syntactic parsing

Grapheme-to-phoneme

Stress analysis