Releases: FluidInference/FluidAudio
Releases · FluidInference/FluidAudio
v0.12.0 PocketTTS
New TTS model converted from the Pocket TTS
- simpler chunking algorithm while supporting longer token counts
- EOS detection stops generation naturally instead of hitting a fixed token wall
- real time support
- No espeak dependency
- iOS RAM friendly
Bug Fixes
- #282 — Custom vocabulary rescoring now applies to chunked long audio (was silently skipped on >15s
audio) — thanks @Beingpax
Note
- we plan to deprecate kokoro tts in the future soon
v0.11.0 Custom Vocabulary & Pipeline Updates
What’s New
- Custom Vocabulary Support (#251): Major feature for custom vocab with recognized domain terms
- Vocabulary Pipeline Refactor (#276): Restructure, dead code removal, and pure Swift dataset
- download Float16 Xcode GUI Build Failure (#270): Add architecture checks to avoid build break by @schmatz
- Documentation Refresh (#280): Reorganize docs and remove stale content README Updates: Link and description cleanup
Full Changelog: v0.10.1...v0.11.0
v0.10.1: Streaming & De-esser
What's New
- Streaming Audio Processing (#257): Memory-efficient transcription for large files — 99.5% reduction (230MB → ~1.2MB constant)
- TTS De-esser (#267): Reduces harsh sibilant sounds in Kokoro TTS output, on by default
Bug Fixes
- macOS 26 Sortformer compatibility (#266): Switch to V2 models to fix BNNS compiler error
- Chunk boundary transcription loss (#264): Fix speech truncation by prepending mel context to non-first chunks
- Legacy FileHandle.write (#262): Replace deprecated API with throwing
write(contentsOf:)
New Contributors
Full Changelog: v0.10.0...v0.10.1
v0.10.0: Sortformer
Sortformer: Real-Time Speaker Diarization
CoreML version of Nividia's Sortformer
- Real-time streaming - Speaker labels as audio comes in
- Noisy environment support - Works where traditional pipelines fail
- Overlapping speech - Scores all 4 speakers independently per frame, multiple can be active simultaneously
- Single neural model - No complex pipeline, just one model
Credit to @SGD2718 for the Sortformer implementation & model conversion.
v0.9.1
What's Changed
Bug Fixes
- fix: Swift 6 Sendable errors with macOS 26.2 SDK (#245) by @tacshi
- fix: Swift 6 concurrency errors in audio conversion (#239) by @Alex-Wengg
- fix: rename CLI executable to fluidaudiocli to avoid Xcode name collision by @Alex-Wengg
- fix(diarizer): use K-Means centroids when speaker count constraint is applied (#236) by @beshkenadze
- Preventing loops with non-blank tokens (#244) by @Steven-Weng
New Contributors
Full Changelog: v0.9.0...v0.9.1
v0.9.0 - Swift 6 Support
What's New
Swift 6 Support
- Full Swift 6 compatibility
- Updated swift-tools-version
Full Changelog: v0.8.2...v0.9.0
v0.8.2
What's Changed
- Padding blank audio to the back of short audio files Bug Fix (#234) - @Steven-Weng
- feat(tts): SSML tag phoneme, sub, and say-as support for kokoro TTS coreml (#235) - @smdesai
Full Changelog: v0.8.1...v0.8.2
v0.8.1
What's New
Features
- Transcription progress: Support emitting transcription progress percentage (#229) - @xinnjie
- Speaker count constraints: Added
minSpeakers,maxSpeakers,numSpeakersoptions for diarization (#220) - @beshkenadze - CLI JSON output: Added
--output-jsonflag to transcribe command (#222) - @Steven-Weng
Bug Fixes
- Fixed segmentation prewarm hang using async prediction API (#227) - @Tasktivity
- Fixed content skipping bug (#223) - @Steven-Weng
New Contributors
- @xinnjie made their first contribution in #229
- @beshkenadze made their first contribution in #220
- @Tasktivity made their first contribution in #227
- @Steven-Weng made their first contribution in #222
Full Changelog: v0.8.0...v0.8.1
v0.8.0
What's New
Parakeet EOU Streaming ASR (#216)
New streaming ASR with End-of-Utterance (EOU) detection using NVIDIA's Parakeet EOU 120M model.
Features:
StreamingEouAsrManager- streaming pipeline with 160ms and 320ms chunk support- Real-time End-of-Utterance detection with configurable debounce (default 1280ms)
- Native Swift
NeMoMelSpectrogramwith vDSP vectorization RnntDecoder- RNN-T greedy decoder with EOU detection- Automatic model downloads from HuggingFace
CLI:
swift run fluidaudio parakeet-eou audio.wav --chunk-ms 320Full Changelog: v0.7.12...v0.8.0
v0.7.12
v0.7.12
- Custom TTS pronunciation dictionaries — lexicon file support for Kokoro (CLI --lexicon, TtsCustomLexicon, runtime updates), multi-tier matching, and docs/tests (#213) by smdesai
- Hugging Face downloads — integrate the official Swift Hugging Face SDK for model fetching with env token/proxy support and leaner registry traversal (#215)
- Platform/model validation utilities — add SystemInfo.isIntelMac, AsrModels.isModelValid, streaming decoder reuse, and non-contiguous stride handling (#210)