Nova-2 transcriptions have the last 10-20% of the transcription cut off, often. #474
Replies: 17 comments 14 replies
-
Thanks for asking your question about Deepgram! If you didn't already include it in your post, please be sure to add as much detail as possible so we can assist you efficiently, such as:
|
Beta Was this translation helpful? Give feedback.
-
Hey @venusatuluri we're looking into this. Can you upload the audio files you found the issue with? GitHub only allows certain file types to be uploaded so you'll have to zip the audio files before uploading them. |
Beta Was this translation helpful? Give feedback.
-
Yes, our audio files are usually less than 15 seconds.
…On Wed, Dec 6, 2023 at 7:31 AM Jason Maldonis ***@***.***> wrote:
Thanks @venusatuluri <https://github.com/venusatuluri> very helpful. We
currently think this is an issue for audio files less than 15 seconds and
we're working on it. Are all your audio files very short?
—
Reply to this email directly, view it on GitHub
<#474 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHZO2B372KRBCQH7X44WZTYICFUDAVCNFSM6AAAAABADPNGA6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TONZXG42TA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
We are experiencing very similar issues. For instance, for this audio (7pk6hra.mp3.zip) we got the following response: {
"metadata": {
"transaction_key": "deprecated",
"request_id": "c043749a-2942-4647-a383-52a3559cc5f2",
"sha256": "bd9417ff8d9c8bb0607d84b23ccafb8c72fa3cf4cba3aa26e740eab3ebce112f",
"created": "2023-12-08T11:53:31.852Z",
"duration": 25.8,
"channels": 1,
"models": [
"a375937a-9156-40cb-940d-6a1103040ef1"
],
"model_info": {
"a375937a-9156-40cb-940d-6a1103040ef1": {
"name": "2-general-nova",
"version": "2023-11-14.3290",
"arch": "nova-2"
}
}
},
"results": {
"channels": [
{
"alternatives": [
{
"transcript": "Next time around, I wonder if I should break out the grades taught, in more detail. I think it'd be interesting to find out more about which grades in high school, perhaps even the subjects taught. So breaking out the other category into, other, you know, interspecific areas, or making sure that's the the respondents can",
"confidence": 0.9970011,
"words": [
{
"word": "next",
"start": 0.96,
"end": 1.28,
"confidence": 0.99588674,
"punctuated_word": "Next"
},
{
"word": "time",
"start": 1.28,
"end": 1.52,
"confidence": 0.9997961,
"punctuated_word": "time"
},
{
"word": "around",
"start": 1.52,
"end": 2.0,
"confidence": 0.99558043,
"punctuated_word": "around,"
},
{
"word": "i",
"start": 2.0,
"end": 2.1599998,
"confidence": 0.9997038,
"punctuated_word": "I"
},
{
"word": "wonder",
"start": 2.1599998,
"end": 2.56,
"confidence": 0.9999014,
"punctuated_word": "wonder"
},
{
"word": "if",
"start": 2.56,
"end": 2.8,
"confidence": 0.99989545,
"punctuated_word": "if"
},
{
"word": "i",
"start": 2.8,
"end": 3.04,
"confidence": 0.99983394,
"punctuated_word": "I"
},
{
"word": "should",
"start": 3.04,
"end": 3.52,
"confidence": 0.9997979,
"punctuated_word": "should"
},
{
"word": "break",
"start": 3.52,
"end": 3.84,
"confidence": 0.9970011,
"punctuated_word": "break"
},
{
"word": "out",
"start": 3.84,
"end": 4.34,
"confidence": 0.99784386,
"punctuated_word": "out"
},
{
"word": "the",
"start": 4.56,
"end": 5.04,
"confidence": 0.999401,
"punctuated_word": "the"
},
{
"word": "grades",
"start": 5.04,
"end": 5.54,
"confidence": 0.9980287,
"punctuated_word": "grades"
},
{
"word": "taught",
"start": 5.6,
"end": 6.1,
"confidence": 0.885115,
"punctuated_word": "taught,"
},
{
"word": "in",
"start": 6.72,
"end": 7.04,
"confidence": 0.9998895,
"punctuated_word": "in"
},
{
"word": "more",
"start": 7.04,
"end": 7.2799997,
"confidence": 0.99985766,
"punctuated_word": "more"
},
{
"word": "detail",
"start": 7.2799997,
"end": 7.7599998,
"confidence": 0.98655343,
"punctuated_word": "detail."
},
{
"word": "i",
"start": 7.7599998,
"end": 7.9199996,
"confidence": 0.9998547,
"punctuated_word": "I"
},
{
"word": "think",
"start": 7.9199996,
"end": 8.08,
"confidence": 0.99987996,
"punctuated_word": "think"
},
{
"word": "it'd",
"start": 8.08,
"end": 8.24,
"confidence": 0.99879843,
"punctuated_word": "it'd"
},
{
"word": "be",
"start": 8.24,
"end": 8.4,
"confidence": 0.99978024,
"punctuated_word": "be"
},
{
"word": "interesting",
"start": 8.4,
"end": 8.8,
"confidence": 0.99914205,
"punctuated_word": "interesting"
},
{
"word": "to",
"start": 8.8,
"end": 8.96,
"confidence": 0.9997229,
"punctuated_word": "to"
},
{
"word": "find",
"start": 8.96,
"end": 9.2,
"confidence": 0.9999037,
"punctuated_word": "find"
},
{
"word": "out",
"start": 9.2,
"end": 9.36,
"confidence": 0.9998086,
"punctuated_word": "out"
},
{
"word": "more",
"start": 9.36,
"end": 9.86,
"confidence": 0.9994623,
"punctuated_word": "more"
},
{
"word": "about",
"start": 10.304999,
"end": 10.705,
"confidence": 0.9397779,
"punctuated_word": "about"
},
{
"word": "which",
"start": 10.705,
"end": 11.184999,
"confidence": 0.999833,
"punctuated_word": "which"
},
{
"word": "grades",
"start": 11.184999,
"end": 11.584999,
"confidence": 0.9696138,
"punctuated_word": "grades"
},
{
"word": "in",
"start": 11.584999,
"end": 11.745,
"confidence": 0.9996402,
"punctuated_word": "in"
},
{
"word": "high",
"start": 11.745,
"end": 11.905,
"confidence": 0.9995679,
"punctuated_word": "high"
},
{
"word": "school",
"start": 11.905,
"end": 12.405,
"confidence": 0.96477866,
"punctuated_word": "school,"
},
{
"word": "perhaps",
"start": 13.025,
"end": 13.424999,
"confidence": 0.58096147,
"punctuated_word": "perhaps"
},
{
"word": "even",
"start": 13.424999,
"end": 13.664999,
"confidence": 0.99357784,
"punctuated_word": "even"
},
{
"word": "the",
"start": 13.664999,
"end": 13.825,
"confidence": 0.9576805,
"punctuated_word": "the"
},
{
"word": "subjects",
"start": 13.825,
"end": 14.304999,
"confidence": 0.99692434,
"punctuated_word": "subjects"
},
{
"word": "taught",
"start": 14.304999,
"end": 14.705,
"confidence": 0.9293262,
"punctuated_word": "taught."
},
{
"word": "so",
"start": 14.705,
"end": 15.205,
"confidence": 0.9992028,
"punctuated_word": "So"
},
{
"word": "breaking",
"start": 15.344999,
"end": 15.664999,
"confidence": 0.8038731,
"punctuated_word": "breaking"
},
{
"word": "out",
"start": 15.664999,
"end": 15.905,
"confidence": 0.9981931,
"punctuated_word": "out"
},
{
"word": "the",
"start": 15.905,
"end": 16.145,
"confidence": 0.72913694,
"punctuated_word": "the"
},
{
"word": "other",
"start": 16.145,
"end": 16.465,
"confidence": 0.9968208,
"punctuated_word": "other"
},
{
"word": "category",
"start": 16.465,
"end": 16.965,
"confidence": 0.998252,
"punctuated_word": "category"
},
{
"word": "into",
"start": 17.025,
"end": 17.525,
"confidence": 0.7878603,
"punctuated_word": "into,"
},
{
"word": "other",
"start": 18.32,
"end": 18.72,
"confidence": 0.8469903,
"punctuated_word": "other,"
},
{
"word": "you",
"start": 18.72,
"end": 18.880001,
"confidence": 0.9902318,
"punctuated_word": "you"
},
{
"word": "know",
"start": 18.880001,
"end": 19.12,
"confidence": 0.99799347,
"punctuated_word": "know,"
},
{
"word": "interspecific",
"start": 19.12,
"end": 19.62,
"confidence": 0.931517,
"punctuated_word": "interspecific"
},
{
"word": "areas",
"start": 19.92,
"end": 20.42,
"confidence": 0.63309896,
"punctuated_word": "areas,"
},
{
"word": "or",
"start": 21.04,
"end": 21.12,
"confidence": 0.5008061,
"punctuated_word": "or"
},
{
"word": "making",
"start": 21.12,
"end": 21.44,
"confidence": 0.9887378,
"punctuated_word": "making"
},
{
"word": "sure",
"start": 21.44,
"end": 21.6,
"confidence": 0.99133146,
"punctuated_word": "sure"
},
{
"word": "that's",
"start": 21.6,
"end": 21.92,
"confidence": 0.6207464,
"punctuated_word": "that's"
},
{
"word": "the",
"start": 21.92,
"end": 22.42,
"confidence": 0.50518316,
"punctuated_word": "the"
},
{
"word": "the",
"start": 22.48,
"end": 22.56,
"confidence": 0.6938833,
"punctuated_word": "the"
},
{
"word": "respondents",
"start": 22.56,
"end": 23.06,
"confidence": 0.988893,
"punctuated_word": "respondents"
},
{
"word": "can",
"start": 24.16,
"end": 24.66,
"confidence": 0.8660644,
"punctuated_word": "can"
}
],
"paragraphs": {
"transcript": "Next time around, I wonder if I should break out the grades taught, in more detail. I think it'd be interesting to find out more about which grades in high school, perhaps even the subjects taught. So breaking out the other category into, other, you know, interspecific areas, or making sure that's the the respondents can",
"paragraphs": [
{
"sentences": [
{
"text": "Next time around, I wonder if I should break out the grades taught, in more detail.",
"start": 0.96,
"end": 7.7599998
},
{
"text": "I think it'd be interesting to find out more about which grades in high school, perhaps even the subjects taught.",
"start": 7.7599998,
"end": 14.705
},
{
"text": "So breaking out the other category into, other, you know, interspecific areas, or making sure that's the the respondents can",
"start": 14.705,
"end": 24.66
}
],
"num_words": 56,
"start": 0.96,
"end": 24.66
}
]
}
}
]
}
]
}
} |
Beta Was this translation helpful? Give feedback.
-
Hey all (@venusatuluri @rliffredo @danielclas), we released an update to our Nova 2 model today that prevents missing words at the end of transcripts. Now, when you use We are still updating the Nova 2 phonecall, meeting, etc. models with this fix and the updated models will be released as they are updated. |
Beta Was this translation helpful? Give feedback.
-
We are seeing a similar problem very intermittently while using nova-2 phonecall model. I have just reverted us back to the nova phonecall model. Sometimes text would be missing from the start and not just a word or two, but an entire sentence. In another example, it was missing a bunch of text from the middle, which makes even less sense. If you want to look at the issue here is log data for you: This is a 1m30s piece of audio. The whole first 15 seconds of audio is not transcribed. This is what we got back from you: "We can work up a no obligation quote for you. Does that sound good? Sure. I apologize for not providing that information upfront. Our company name is Tribeca Group located in New York. Our callback number is x x x x x x, and our website is ww.tribeccagroup.com. Is there anything else I can assist you with? Absolutely. I apologize for not providing that information earlier. Our company name is Tribeca Group located in New York. You can reach us at x x x x x, and our website is ww.tribeccagroup.com. Is there anything else I can help you with? I apologize for any confusion. Tribeca Group is a reputable company based in New York. We specialize in providing capital for businesses like yours. Our callback number is x x x x x x, and you can find more information about us on our website, wwtribeccagroup.com. We're here to assist you with any questions or concern" |
Beta Was this translation helpful? Give feedback.
-
feb5_update.zip However, I ran into the same or similar issue with a different file today - the transcription is empty with nova-2, while nova returns the transcription correctly. It's a rather short audio, for sure. I have attached the zip file. |
Beta Was this translation helpful? Give feedback.
-
feb6_update.zip |
Beta Was this translation helpful? Give feedback.
-
It just happened to me as well. We are transcribing big podcast episodes and some sentences are missing! Request ID: File: https://storage.googleapis.com/test-reallife-app/rev/RLEP%20378%20(Transcript%20v3).flac.zip Missing sentences (you can see the "paragraph" timestamp on the top-left corner of each picture): Having a better model like Nova-2, which reduces the WER is amazing. But if it keeps missing a few sentences, it will increase our "review processes" time much more than having some words wrong. Not worthing migrating. |
Beta Was this translation helpful? Give feedback.
-
Hello @team-deepgram We are also experiencing issues with missing text in the transcripts. We noticed that recordings with a bit lower audio quality are missing transcription at the beginning and also at the end. This issue happens only using the Nova 2 model. We transcribed the same audio with the Deepgram's Whisper model and no text was missing. Such issues make the Nova 2 model totally unreliable. The issue impacts not only English but also other languages. We need a permanent solution as it seems to be a problem for many of your customers. |
Beta Was this translation helpful? Give feedback.
-
What is the status on this concerning issue, we are now at a decision point on Deepgram or Whisper 3 |
Beta Was this translation helpful? Give feedback.
-
Thanks for the heads up, Deepgram scratched, poor support. |
Beta Was this translation helpful? Give feedback.
-
Hi all who are on this thread (cc @NorthbridgeBB @vpalenik1 @rwrz). My apologies that our models are not meeting your expectations. Models have their relative strengths and weaknesses depending on various aspects of audio - such as your audio length, format, quality, background noise, streaming versus pre-recorded audio, and also your business use case. We offer several different models and encourage users to do their own testing to see what works best for them. For customers such as yourselves who don't see results that they're satisfied with using Nova-2 ( I have also seen that file format specifically makes a difference with pre-recorded audio for Nova-2, where converting mp3 to wav reduces missed words. For further suggestions on how you can test different conditions with your own audio, also see this thread: https://github.com/orgs/deepgram/discussions/623 While some patterns of errors may appear similar on the surface, often there are various underlying factors due to ML model training, the audio itself, and the application needs, rather than being a single bug that can be trivially located and fixed. We are continually iterating and training new models in response to customer feedback as well as innovating on our model architectures. I understand that Nova-2 does have cases where it is "shyer" and may miss words in a transcript. We have researched and improved several known causes of this pattern of results, and based on your feedback, we have more work to continue to do. While we may not be able to triage and root-cause each unsatisfactory request, I invite you to share Deepgram request IDs, which is the most effective way for us to internally examine and investigate any poorly-performing requests in our ongoing efforts to improve our models. |
Beta Was this translation helpful? Give feedback.
-
I've experimented a few things, turning on and off stuff, and I hope it can help anyone that is experiencing these issues as well. Multichannel Issue [multichannel]During my tests, this option will increase the probability to have fully missing sentences. With multichannel:
With DIARIZATION instead [docs]:
SmartFormat issue [docs]During my tests, I've experienced some sentences repeating, sometimes even "unexpected words instead". You can replace it with PUNCTUATION if you need to, it works much better. Yeah, it is not the same, but produces more reliable outputs. With Smart Format:
Correct (with punctuation) [docs]:
-- |
Beta Was this translation helpful? Give feedback.
-
Hi, I tried nova-2 after a few months today to see if this problem has resolved. Unfortunately the pattern persists where nova returns the correct transcription while nova-2 returns empty string. The request id using nova-2 is 30f7ce1c-bc91-411e-b7d2-7ae7814d04ca, while the request id using nova is 4fa00dd3-a50b-4ca1-b831-d37467d4db8f. |
Beta Was this translation helpful? Give feedback.
-
The problems I was facing, they were better for a while, but today we had a big failure on this sample: It is crazy how it can halucinate like this. I hope that having a sample of the issue, will help you to fix it. |
Beta Was this translation helpful? Give feedback.
-
Which Deepgram product are you using?
Deepgram API
Details
Nova-2 transcriptions are frequently missing the last 10-20% of the transcription. The pattern of error is very consistent - it's always the last few words that go missing. I have wav example files, but weirdly this bug report tool doesn't let me upload any standard audio formats like wav or mp3.
If you are making a request to the Deepgram API, what is the full Deepgram URL you are making a request to?
https://api.deepgram.com/v1/listen?language=en&model=nova-2&smart_format=true
If you are making a request to the Deepgram API and have a request ID, please paste it below:
No response
If possible, please attach your code or paste it into the text box.
No response
If possible, please attach an example audio file to reproduce the issue.
No response
Beta Was this translation helpful? Give feedback.
All reactions