Skip to content

ltgoslo/NorBERT

Repository files navigation

NorBERT

This repository contains in-house code used in training and evaluating NorBERT-1 and NorBERT-2: large-scale Transformer-based language models for Norwegian. The models were trained by the Language Technology Group at the University of Oslo. The computations were performed on resources provided by UNINETT Sigma2 - the National Infrastructure for High Performance Computing and Data Storage in Norway.

For most of the training, BERT For TensorFlow from NVIDIA was used. We made minor changes to their code, see the patches_for_NVIDIA_BERT subdirectory.

NorBERT models training was conducted as a part of the NorLM project. Check this paper for more details:

Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen. Large-Scale Contextualised Language Modelling for Norwegian, NoDaLiDa'21 (2021)

NorBERT-3

In 2023, we released a new family of NorBERT-3 language models for Norwegian. In general, we now recommend using these models:

NorBERT-3 is described in detail in this paper: NorBench – A Benchmark for Norwegian Language Models (Samuel et al., NoDaLiDa 2023)

Logo