This repository contains scripts and resources to replicate the training of Tweety Italian models.
The src folder contains python and bash script organized into:
continual_training: to run a small number of adaptation steps in Italian after the tokenizer swap;alignment: scripts and recipes to run SFT and DPO with HF's alignment-notebookdatasets: code to create dataset resources