Skip to content

Conversation

Deep-unlearning
Copy link
Contributor

@Deep-unlearning Deep-unlearning commented Jul 14, 2025

Summary

This PR adds comprehensive coverage of ASR (mainly whisper) implementations beyond the transformers library, addressing the need for practical guidance on production deployments across different platforms and use cases.

This new unit mentions:

  • whisper.cpp: C++ port with 10x faster CPU inference, ultra-low memory usage, cross-platform support
  • faster-whisper: 4x faster with CTranslate2, GPU support, streaming capabilities, zero accuracy loss
  • MLX-Whisper: Apple Silicon native, 50% faster on M-series chips, Metal acceleration
  • Lightning-Whisper-MLX: 10x faster MLX implementation for Apple Silicon
  • WhisperKit: On-device Apple deployment with Swift APIs for iOS/macOS

A Comprehensive Code Examples

  • Installation and setup for each implementation
  • Basic usage patterns and advanced configurations
  • Performance benchmarking utilities
  • Integration with existing transformers workflows
  • Deployment strategies for different platforms

A Practical Deployment Guidance

  • Edge Computing: Hardware requirements, optimization techniques
  • Mobile Deployment: iOS/Android integration examples
  • Real-Time Streaming: Low-latency transcription implementations
  • Resource Optimization: Memory management, quantization, pruning

- Update Common Voice dataset from v13 to v17 (latest available)
- Update language count from 108 to 124 languages in Common Voice 17
- Update all dataset URLs and references throughout chapter5 files
- Update Whisper model reference from whisper-large-v2 to whisper-large-v3
- Update training examples and code snippets to use latest dataset version
- Maintain educational content structure while using current resources

Files updated:
- chapters/en/chapter5/choosing_dataset.mdx
- chapters/en/chapter5/evaluation.mdx
- chapters/en/chapter5/fine-tuning.mdx
- chapters/en/chapter5/asr_models.mdx
- chapters/en/_toctree.yml (minor formatting fix)
- Add detailed section on Moonshine ASR: edge-optimized, 5x faster for short audio
- Add detailed section on Kyutai STT: real-time streaming capabilities
- Include architecture comparison table with performance characteristics
- Add code examples for using Moonshine and Kyutai models
- Update model selection table with new ASR alternatives
- Add model-specific dataset recommendations in choosing_dataset.mdx
- Provide guidance on when to choose each model architecture
- Update summary to reflect expanded ASR landscape

This addresses the Whisper-centric nature of Chapter 5 by providing comprehensive
coverage of modern ASR alternatives with different optimization focuses.
- Create new alternative_implementations.mdx with extensive coverage of:
  * whisper.cpp: C++ port, 10x faster CPU inference, edge deployment
  * faster-whisper: 4x faster with CTranslate2, GPU support, streaming
  * insanely-fast-whisper: 36x faster, ultra-low latency, real-time focus
  * MLX-Whisper: Apple Silicon optimized, 50% faster on M-series chips
  * Lightning-Whisper-MLX: 10x faster MLX implementation
  * WhisperKit: On-device Apple deployment with Swift APIs
  * Conformer models: Edge computing, 5.26x realtime on wearables
  * Squeezeformer: Memory-efficient architecture alternative

- Add comprehensive performance comparison table with metrics for:
  * Speed vs original implementation
  * Memory usage characteristics
  * Platform-specific optimizations
  * Accuracy trade-offs
  * Deployment recommendations

- Include practical code examples for each implementation
- Add deployment strategies for edge, mobile, and real-time scenarios
- Provide benchmarking utilities and best practices
- Add integration examples with existing transformers workflows

- Update _toctree.yml to include new file in navigation
- Add cross-references in asr_models.mdx and choosing_dataset.mdx
- Maintain educational structure while expanding ecosystem coverage

This addresses the need for comprehensive coverage of ASR implementations
beyond the transformers library, providing practical guidance for
production deployments across different platforms and use cases.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants