Skip to content

Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare DatasetΒ #34

@pr0mila

Description

@pr0mila

Description:

We propose adding the MediBeng dataset to the repository. MediBeng is a synthetic, code-switched dataset in Bengali-English, designed specifically for ASR, TTS, and Machine Translation tasks in healthcare. It focuses on bilingual code-switching, which is common in healthcare settings, and is freely available for use.

Key Features:

  • Language: Bengali and English (Code-Switched)
  • Primary Use Cases: ASR, TTS, Machine Translation for Healthcare
  • Free to Use

Links:

This dataset can contribute significantly to improving models for bilingual speech recognition and language processing in healthcare contexts.

Request: Please review and add MediBeng to the dataset collection for use by researchers and developers working on multilingual and healthcare-specific models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions