-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task: Language Identification #85
Comments
correct. we may want to have 2 modules, one in |
For now, I have implemented audio-based language identification using Speechbrain's models. These models work assuming that only one language is included in a clip. In the future, we may want to integrate the Whisper model. This should be easy to implement since we already use the same model for speech-to-text and allow the identification of multiple languages in the same clip. Here is a first draft of how Whisper works: import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-tiny"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=16,
torch_dtype=torch_dtype,
device=device,
return_language=True
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
result['chunks'][0]['language'] |
@ibevers I have implemented the speech-based version of this. Can you do the same with text-based language identification? Simply integrating huggingface models for this should be more than fine for now (https://huggingface.co/models?search=language%20detection) |
@fabiocat93 Thank you for your patience. I can take a look at this after next week |
I will extend your deadline one more time (from Dec 16 to Jan 14). Please, let me know if you face any blockers |
Description
As far as I understand, this should take in an Audio or a ScriptLine and output a Language object
Tasks
Audio
Text
Freeform Notes
Might want to have examples that cover a wide range of languages, or we could just trust the model developer. Ideally, we should have multi-class output, so if a given input includes more than one language, the output will reflect that.
The text was updated successfully, but these errors were encountered: