My models uploaded to HuggingFace/SurajP
To train some foundatinal models for Indian languages with quality data that can be later fine-tuned for other purposes.
A repository for my experiments with Language models with different languages. Currently, I am focusing on Indian Languages, especially on Sanskrit, Hindi and Gujarati.
Goals:
- Train a models on languages separately
- Sanskrit
- Hindi
- Gujarati
- Fine-tune it on a similar language for Language modelling
- Hindi -> Other regional languages
- Sanskrit -> Hindi
- Sanskrit -> Gujarati
- Train a multillingual model only with similar languages
- Sanskrit + Hindi + Gujarati
- Experiment with transfer leraning using Sanskrit as base langugage. (Need to work on tokenizer)
- Generative models?