Skip to content

[DMP 2024]: Create offline audio-phoetic matching model #313

Open
@Gautam-Rajeev

Description

@Gautam-Rajeev

Offline Alternative to Google's Read Along App in Hindi

Description

Develop an offline application (POC - web) that can display a set of Hindi words and accurately determine if the user has pronounced each word correctly. The app aims to be an educational tool for Hindi language learners, providing instant feedback on their pronunciation.

The application is envisioned as an offline tool similar to Google's Read Along app but specifically for the Hindi language. It should present users with Hindi words and listen to the user's attempt to pronounce these words, providing feedback on the accuracy of their pronunciation.

Approaches for Consideration:

  • Vector Representation of Words: Explore the possibility of maintaining vector representations of the required set of Hindi words. These vectors will be used to match against the vector-encoded recordings of spoken words by the user.
  • Acoustic Word Encodings: Utilize acoustic word encodings to convert the list of Hindi words into a vector form. This encoding will then be used to match against the encoded recordings from users, determining the accuracy of pronunciation.
  • Feedback Mechanism: Implement a feedback system that informs users of the correctness of their pronunciation and offers suggestions or corrections as needed.

Implementation Details:

  • The project requires the creation of a robust and efficient algorithm for converting Hindi words and spoken recordings into vector representations that can be accurately compared.
  • The app should be capable of running offline, necessitating all necessary data and models to be stored locally on the device.
  • User interface design should be intuitive, encouraging users to engage with the app and improve their Hindi pronunciation skills.
  • Consideration should be given to privacy and data security, especially concerning user recordings.

This is an open invitation for contributors to suggest ideas, approaches, and potential technologies that could be utilized to achieve the project goals. Contributions at all stages of development are welcome, from conceptualization to implementation.

Goals & Mid-Point Milestone

  • A repo of small size that is able to infer if a wav file has some predefined words (around 2000)

Sample audio files:

Acceptance Criteria

Being able to create a lite model that is able to detect the subset of words that a child has correctly pronounced.

Mockups/Wireframes

Product Name

Nipun Lakshya App

Organisation Name

SamagraX

Domain

⁠Education

Tech Skills Needed

Machine Learning, Natural Language Processing, Python

Mentor(s)

@GautamR-Samagra

Category

Machine Learning

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions