Skip to content

Adding language identification from text and speech #207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fabiocat93
Copy link
Collaborator

Description

This PR introduces a language Identification functionality. The new feature processes either an Audio object or a ScriptLine object and outputs a Language object,.

Key additions:

  • General APIs for language identification for both audio and text inputs.
  • Tutorials and documentation for ease of use and understanding.

Related Issue(s)

Closes #85.

How Has This Been Tested?

  • Unit tests for the API functionality (both audio and text inputs).

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • [] I have added tests to cover my changes.
  • [] All new and existing tests passed.
  • [] My code follows the code style of this project.
  • [] Documentation and tutorials for the new feature have been added.

@fabiocat93 fabiocat93 added enhancement New feature or request release minor Minor release to-test labels Nov 20, 2024
@fabiocat93 fabiocat93 linked an issue Nov 20, 2024 that may be closed by this pull request
10 tasks
@codecov-commenter
Copy link

codecov-commenter commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 97.40260% with 2 lines in your changes missing coverage. Please review.

Project coverage is 65.18%. Comparing base (113721a) to head (39eaa8d).
Report is 54 commits behind head on main.

Files with missing lines Patch % Lines
...enselab/audio/tasks/language_identification/api.py 88.88% 1 Missing ⚠️
...audio/tasks/language_identification/speechbrain.py 97.05% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
+ Coverage   60.24%   65.18%   +4.93%     
==========================================
  Files         113      123      +10     
  Lines        4017     4265     +248     
==========================================
+ Hits         2420     2780     +360     
+ Misses       1597     1485     -112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@fabiocat93
Copy link
Collaborator Author

@ibevers did you have any time to work on this?

@fabiocat93 fabiocat93 marked this pull request as draft December 23, 2024 15:23
@ibevers
Copy link
Collaborator

ibevers commented Jan 3, 2025

@ibevers did you have any time to work on this?

@fabiocat93 No, I did not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor Minor release release to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task: Language Identification
3 participants