Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 937 Bytes

README.md

File metadata and controls

25 lines (18 loc) · 937 Bytes

Experiments on language-models

My models uploaded to HuggingFace/SurajP

Transformers - परिवर्तक

To train some foundatinal models for Indian languages with quality data that can be later fine-tuned for other purposes.

A repository for my experiments with Language models with different languages. Currently, I am focusing on Indian Languages, especially on Sanskrit, Hindi and Gujarati.

Goals:

  • Train a models on languages separately
    • Sanskrit
    • Hindi
    • Gujarati
  • Fine-tune it on a similar language for Language modelling
    • Hindi -> Other regional languages
    • Sanskrit -> Hindi
    • Sanskrit -> Gujarati
  • Train a multillingual model only with similar languages
    • Sanskrit + Hindi + Gujarati
  • Experiment with transfer leraning using Sanskrit as base langugage. (Need to work on tokenizer)
  • Generative models?