Skip to content

A repository for my experiments with Language models with different language

Notifications You must be signed in to change notification settings

parmarsuraj99/parivartak-indic-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Experiments on language-models

My models uploaded to HuggingFace/SurajP

Transformers - परिवर्तक

To train some foundatinal models for Indian languages with quality data that can be later fine-tuned for other purposes.

A repository for my experiments with Language models with different languages. Currently, I am focusing on Indian Languages, especially on Sanskrit, Hindi and Gujarati.

Goals:

  • Train a models on languages separately
    • Sanskrit
    • Hindi
    • Gujarati
  • Fine-tune it on a similar language for Language modelling
    • Hindi -> Other regional languages
    • Sanskrit -> Hindi
    • Sanskrit -> Gujarati
  • Train a multillingual model only with similar languages
    • Sanskrit + Hindi + Gujarati
  • Experiment with transfer leraning using Sanskrit as base langugage. (Need to work on tokenizer)
  • Generative models?

Releases

No releases published

Packages

No packages published