Skip to content
This repository has been archived by the owner on Sep 15, 2022. It is now read-only.

Latest commit

 

History

History
15 lines (12 loc) · 894 Bytes

attributions.md

File metadata and controls

15 lines (12 loc) · 894 Bytes
  1. Dakshina Dataset
    Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset. Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S. J., Johny, C., Demirsahin, I., & Hall, K. (2020, May). In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 2413-2423).
    https://github.com/google-research-datasets/dakshina

  2. AI4Bharat-IndicNLP Dataset
    IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. Kakwani, D., Kunchukuttan, A., Golla, S., Bhattacharyya, A., Khapra, M. M., & Kumar, P. (2020). Findings of EMNLP.
    https://github.com/AI4Bharat/indicnlp_corpus

  3. Oscar Corpus
    A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages Ortiz Suárez, P., Romary, L., & Sagot, B. (2020). arXiv, arXiv-2006.
    https://oscar-corpus.com/