NorBERT

This repository contains in-house code used in training and evaluating NorBERT-1 and NorBERT-2: large-scale Transformer-based language models for Norwegian. The models were trained by the Language Technology Group at the University of Oslo. The computations were performed on resources provided by UNINETT Sigma2 - the National Infrastructure for High Performance Computing and Data Storage in Norway.

For most of the training, BERT For TensorFlow from NVIDIA was used. We made minor changes to their code, see the patches_for_NVIDIA_BERT subdirectory.

NorBERT models training was conducted as a part of the NorLM project. Check this paper for more details:

Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen. Large-Scale Contextualised Language Modelling for Norwegian, NoDaLiDa'21 (2021)

NorBERT-3

In 2023, we released a new family of NorBERT-3 language models for Norwegian. In general, we now recommend using these models:

NorBERT 3 xs (15M parameters)
NorBERT 3 small (40M parameters)
NorBERT 3 base (123M parameters)
NorBERT 3 large (323M parameters)

NorBERT-3 is described in detail in this paper: NorBench – A Benchmark for Norwegian Language Models (Samuel et al., NoDaLiDa 2023)

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
benchmarking		benchmarking
logs		logs
patches_for_NVIDIA_BERT		patches_for_NVIDIA_BERT
preprocessing		preprocessing
tokenization		tokenization
utils		utils
vocabulary		vocabulary
.gitignore		.gitignore
LICENSE		LICENSE
Norbert.png		Norbert.png
README.md		README.md
batch_tfrec128.sh		batch_tfrec128.sh
batch_tfrec512.sh		batch_tfrec512.sh
create_tfrec_phase1.slurm		create_tfrec_phase1.slurm
create_tfrec_phase2.slurm		create_tfrec_phase2.slurm
norbert_config.json		norbert_config.json
train_bert_phase1.slurm		train_bert_phase1.slurm
train_bert_phase2.slurm		train_bert_phase2.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NorBERT

NorBERT-3

About

Releases

Packages

Contributors 2

Languages

License

ltgoslo/NorBERT

Folders and files

Latest commit

History

Repository files navigation

NorBERT

NorBERT-3

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages