Skip to content

Conversation

Deep-unlearning
Copy link
Contributor

@Deep-unlearning Deep-unlearning commented Jul 14, 2025

Summary

  • Updated Common Voice dataset from v13 to v17 (latest available version)
  • Updated language count from 108 to 124 languages in Common Voice 17
  • Updated Whisper model reference from whisper-large-v2 to whisper-large-v3
  • NEW: Added comprehensive coverage of modern ASR architectures beyond Whisper
  • NEW: Added Moonshine ASR (edge-optimized, 5x faster for short audio)
  • NEW: Added Kyutai STT (real-time streaming capabilities)

Changes Made

Dataset and Model Updates

  • Dataset Version: Common Voice 13 → Common Voice 17
  • Language Support: 108 → 124 languages
  • Model Reference: whisper-large-v2 → whisper-large-v3
  • URLs Updated: All common_voice_13_0common_voice_17_0

New ASR Architecture Coverage

  • Moonshine ASR: Edge computing focus, 5x faster processing for short audio
  • Kyutai STT: Real-time streaming with ultra-low latency (0.5-2.5s)
  • Architecture Comparison: Detailed comparison table with performance metrics
  • Code Examples: Working examples for all three model types
  • Model Selection Guide: When to choose each architecture

Files Modified

  • chapters/en/chapter5/asr_models.mdx - Added modern ASR section, comparison table, code examples
  • chapters/en/chapter5/choosing_dataset.mdx - Added model-specific dataset recommendations
  • chapters/en/chapter5/evaluation.mdx - Updated dataset references
  • chapters/en/chapter5/fine-tuning.mdx - Updated training examples
  • chapters/en/_toctree.yml - Minor formatting fix

Key Features Added

Architecture Comparison Table

Feature Whisper Moonshine Kyutai STT
Processing Fixed 30s chunks Variable-length Streaming
Best Use Case General-purpose ASR Edge/Mobile devices Real-time applications
Speed Baseline 5x faster (short audio) Ultra-low latency
Languages 96+ languages English only English (+French)

Model Selection Guidelines

  • Whisper: Multilingual support, high accuracy, translation capabilities
  • Moonshine: Edge deployment, memory efficiency, fast processing
  • Kyutai STT: Real-time streaming, low latency, robust audio handling

Test Plan

  • Verified Common Voice 17 dataset is available on Hugging Face Hub
  • Confirmed Dhivehi language is supported in Common Voice 17
  • Checked that all URLs and references are valid
  • Ensured code examples maintain compatibility
  • Verified Moonshine and Kyutai models are available on Hugging Face Hub
  • Tested code examples for syntax and API compatibility

- Update Common Voice dataset from v13 to v17 (latest available)
- Update language count from 108 to 124 languages in Common Voice 17
- Update all dataset URLs and references throughout chapter5 files
- Update Whisper model reference from whisper-large-v2 to whisper-large-v3
- Update training examples and code snippets to use latest dataset version
- Maintain educational content structure while using current resources

Files updated:
- chapters/en/chapter5/choosing_dataset.mdx
- chapters/en/chapter5/evaluation.mdx
- chapters/en/chapter5/fine-tuning.mdx
- chapters/en/chapter5/asr_models.mdx
- chapters/en/_toctree.yml (minor formatting fix)
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- Add detailed section on Moonshine ASR: edge-optimized, 5x faster for short audio
- Add detailed section on Kyutai STT: real-time streaming capabilities
- Include architecture comparison table with performance characteristics
- Add code examples for using Moonshine and Kyutai models
- Update model selection table with new ASR alternatives
- Add model-specific dataset recommendations in choosing_dataset.mdx
- Provide guidance on when to choose each model architecture
- Update summary to reflect expanded ASR landscape

This addresses the Whisper-centric nature of Chapter 5 by providing comprehensive
coverage of modern ASR alternatives with different optimization focuses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants