Add comprehensive coverage of non-transformers ASR implementations #218

Deep-unlearning · 2025-07-14T13:37:41Z

Summary

This PR adds comprehensive coverage of ASR (mainly whisper) implementations beyond the transformers library, addressing the need for practical guidance on production deployments across different platforms and use cases.

This new unit mentions:

whisper.cpp: C++ port with 10x faster CPU inference, ultra-low memory usage, cross-platform support
faster-whisper: 4x faster with CTranslate2, GPU support, streaming capabilities, zero accuracy loss
MLX-Whisper: Apple Silicon native, 50% faster on M-series chips, Metal acceleration
Lightning-Whisper-MLX: 10x faster MLX implementation for Apple Silicon
WhisperKit: On-device Apple deployment with Swift APIs for iOS/macOS

A Comprehensive Code Examples

Installation and setup for each implementation
Basic usage patterns and advanced configurations
Performance benchmarking utilities
Integration with existing transformers workflows
Deployment strategies for different platforms

A Practical Deployment Guidance

Edge Computing: Hardware requirements, optimization techniques
Mobile Deployment: iOS/Android integration examples
Real-Time Streaming: Low-latency transcription implementations
Resource Optimization: Memory management, quantization, pruning

- Update Common Voice dataset from v13 to v17 (latest available) - Update language count from 108 to 124 languages in Common Voice 17 - Update all dataset URLs and references throughout chapter5 files - Update Whisper model reference from whisper-large-v2 to whisper-large-v3 - Update training examples and code snippets to use latest dataset version - Maintain educational content structure while using current resources Files updated: - chapters/en/chapter5/choosing_dataset.mdx - chapters/en/chapter5/evaluation.mdx - chapters/en/chapter5/fine-tuning.mdx - chapters/en/chapter5/asr_models.mdx - chapters/en/_toctree.yml (minor formatting fix)

- Add detailed section on Moonshine ASR: edge-optimized, 5x faster for short audio - Add detailed section on Kyutai STT: real-time streaming capabilities - Include architecture comparison table with performance characteristics - Add code examples for using Moonshine and Kyutai models - Update model selection table with new ASR alternatives - Add model-specific dataset recommendations in choosing_dataset.mdx - Provide guidance on when to choose each model architecture - Update summary to reflect expanded ASR landscape This addresses the Whisper-centric nature of Chapter 5 by providing comprehensive coverage of modern ASR alternatives with different optimization focuses.

- Create new alternative_implementations.mdx with extensive coverage of: * whisper.cpp: C++ port, 10x faster CPU inference, edge deployment * faster-whisper: 4x faster with CTranslate2, GPU support, streaming * insanely-fast-whisper: 36x faster, ultra-low latency, real-time focus * MLX-Whisper: Apple Silicon optimized, 50% faster on M-series chips * Lightning-Whisper-MLX: 10x faster MLX implementation * WhisperKit: On-device Apple deployment with Swift APIs * Conformer models: Edge computing, 5.26x realtime on wearables * Squeezeformer: Memory-efficient architecture alternative - Add comprehensive performance comparison table with metrics for: * Speed vs original implementation * Memory usage characteristics * Platform-specific optimizations * Accuracy trade-offs * Deployment recommendations - Include practical code examples for each implementation - Add deployment strategies for edge, mobile, and real-time scenarios - Provide benchmarking utilities and best practices - Add integration examples with existing transformers workflows - Update _toctree.yml to include new file in navigation - Add cross-references in asr_models.mdx and choosing_dataset.mdx - Maintain educational structure while expanding ecosystem coverage This addresses the need for comprehensive coverage of ASR implementations beyond the transformers library, providing practical guidance for production deployments across different platforms and use cases.

HuggingFaceDocBuilderDev · 2025-07-14T13:45:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Deep-unlearning added 5 commits July 14, 2025 13:14

remove alternative architectures

a5184c2

nit

e923326

Deep-unlearning added 2 commits August 21, 2025 16:12

revert file to orignal version

9b6bfe7

remove comparative benchmarks

0aca485

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive coverage of non-transformers ASR implementations #218

Add comprehensive coverage of non-transformers ASR implementations #218

Uh oh!

Deep-unlearning commented Jul 14, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add comprehensive coverage of non-transformers ASR implementations #218

Are you sure you want to change the base?

Add comprehensive coverage of non-transformers ASR implementations #218

Uh oh!

Conversation

Deep-unlearning commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

HuggingFaceDocBuilderDev commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Deep-unlearning commented Jul 14, 2025 •

edited

Loading