Update Chapter 5 ASR content with latest datasets and models #217

Deep-unlearning · 2025-07-14T13:15:14Z

Summary

Updated Common Voice dataset from v13 to v17 (latest available version)
Updated language count from 108 to 124 languages in Common Voice 17
Updated Whisper model reference from whisper-large-v2 to whisper-large-v3
NEW: Added comprehensive coverage of modern ASR architectures beyond Whisper
NEW: Added Moonshine ASR (edge-optimized, 5x faster for short audio)
NEW: Added Kyutai STT (real-time streaming capabilities)

Changes Made

Dataset and Model Updates

Dataset Version: Common Voice 13 → Common Voice 17
Language Support: 108 → 124 languages
Model Reference: whisper-large-v2 → whisper-large-v3
URLs Updated: All common_voice_13_0 → common_voice_17_0

New ASR Architecture Coverage

Moonshine ASR: Edge computing focus, 5x faster processing for short audio
Kyutai STT: Real-time streaming with ultra-low latency (0.5-2.5s)
Architecture Comparison: Detailed comparison table with performance metrics
Code Examples: Working examples for all three model types
Model Selection Guide: When to choose each architecture

Files Modified

chapters/en/chapter5/asr_models.mdx - Added modern ASR section, comparison table, code examples
chapters/en/chapter5/choosing_dataset.mdx - Added model-specific dataset recommendations
chapters/en/chapter5/evaluation.mdx - Updated dataset references
chapters/en/chapter5/fine-tuning.mdx - Updated training examples
chapters/en/_toctree.yml - Minor formatting fix

Key Features Added

Architecture Comparison Table

Feature	Whisper	Moonshine	Kyutai STT
Processing	Fixed 30s chunks	Variable-length	Streaming
Best Use Case	General-purpose ASR	Edge/Mobile devices	Real-time applications
Speed	Baseline	5x faster (short audio)	Ultra-low latency
Languages	96+ languages	English only	English (+French)

Model Selection Guidelines

Whisper: Multilingual support, high accuracy, translation capabilities
Moonshine: Edge deployment, memory efficiency, fast processing
Kyutai STT: Real-time streaming, low latency, robust audio handling

Test Plan

Verified Common Voice 17 dataset is available on Hugging Face Hub
Confirmed Dhivehi language is supported in Common Voice 17
Checked that all URLs and references are valid
Ensured code examples maintain compatibility
Verified Moonshine and Kyutai models are available on Hugging Face Hub
Tested code examples for syntax and API compatibility

- Update Common Voice dataset from v13 to v17 (latest available) - Update language count from 108 to 124 languages in Common Voice 17 - Update all dataset URLs and references throughout chapter5 files - Update Whisper model reference from whisper-large-v2 to whisper-large-v3 - Update training examples and code snippets to use latest dataset version - Maintain educational content structure while using current resources Files updated: - chapters/en/chapter5/choosing_dataset.mdx - chapters/en/chapter5/evaluation.mdx - chapters/en/chapter5/fine-tuning.mdx - chapters/en/chapter5/asr_models.mdx - chapters/en/_toctree.yml (minor formatting fix)

HuggingFaceDocBuilderDev · 2025-07-14T13:19:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- Add detailed section on Moonshine ASR: edge-optimized, 5x faster for short audio - Add detailed section on Kyutai STT: real-time streaming capabilities - Include architecture comparison table with performance characteristics - Add code examples for using Moonshine and Kyutai models - Update model selection table with new ASR alternatives - Add model-specific dataset recommendations in choosing_dataset.mdx - Provide guidance on when to choose each model architecture - Update summary to reflect expanded ASR landscape This addresses the Whisper-centric nature of Chapter 5 by providing comprehensive coverage of modern ASR alternatives with different optimization focuses.

Deep-unlearning added 2 commits July 14, 2025 13:26

nit

0f7984f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Chapter 5 ASR content with latest datasets and models #217

Update Chapter 5 ASR content with latest datasets and models #217

Uh oh!

Deep-unlearning commented Jul 14, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update Chapter 5 ASR content with latest datasets and models #217

Are you sure you want to change the base?

Update Chapter 5 ASR content with latest datasets and models #217

Uh oh!

Conversation

Deep-unlearning commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Dataset and Model Updates

New ASR Architecture Coverage

Files Modified

Key Features Added

Architecture Comparison Table

Model Selection Guidelines

Test Plan

Uh oh!

HuggingFaceDocBuilderDev commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Deep-unlearning commented Jul 14, 2025 •

edited

Loading