Releases · FluidInference/FluidAudio · GitHub

03 Feb 05:09

Alex-Wengg

v0.12.0 PocketTTS Latest

Latest

New TTS model converted from the Pocket TTS

simpler chunking algorithm while supporting longer token counts
EOS detection stops generation naturally instead of hitting a fixed token wall
real time support
No espeak dependency
iOS RAM friendly

Bug Fixes

#282 — Custom vocabulary rescoring now applies to chunked long audio (was silently skipped on >15s
audio) — thanks @Beingpax

Note

we plan to deprecate kokoro tts in the future soon

Contributors

Beingpax

Assets 2

31 Jan 02:40

Alex-Wengg

v0.11.0 Custom Vocabulary & Pipeline Updates

What’s New

Custom Vocabulary Support (#251): Major feature for custom vocab with recognized domain terms
Vocabulary Pipeline Refactor (#276): Restructure, dead code removal, and pure Swift dataset
download Float16 Xcode GUI Build Failure (#270): Add architecture checks to avoid build break by @schmatz
Documentation Refresh (#280): Reorganize docs and remove stale content README Updates: Link and description cleanup

Full Changelog: v0.10.1...v0.11.0

Contributors

schmatz

Assets 2

28 Jan 19:07

Alex-Wengg

v0.10.1: Streaming & De-esser

What's New

Streaming Audio Processing (#257): Memory-efficient transcription for large files — 99.5% reduction (230MB → ~1.2MB constant)
TTS De-esser (#267): Reduces harsh sibilant sounds in Kokoro TTS output, on by default

Bug Fixes

macOS 26 Sortformer compatibility (#266): Switch to V2 models to fix BNNS compiler error
Chunk boundary transcription loss (#264): Fix speech truncation by prepending mel context to non-first chunks
Legacy FileHandle.write (#262): Replace deprecated API with throwing write(contentsOf:)

New Contributors

@starkdmi — chunk boundary transcription fix (#264)

Full Changelog: v0.10.0...v0.10.1

Contributors

starkdmi

Assets 2

12 Jan 01:10

Alex-Wengg

v0.10.0: Sortformer

Sortformer: Real-Time Speaker Diarization

CoreML version of Nividia's Sortformer

Real-time streaming - Speaker labels as audio comes in
Noisy environment support - Works where traditional pipelines fail
Overlapping speech - Scores all 4 speakers independently per frame, multiple can be active simultaneously
Single neural model - No complex pipeline, just one model

Credit to @SGD2718 for the Sortformer implementation & model conversion.

Contributors

SGD2718

Assets 2

03 Jan 19:09

Alex-Wengg

v0.9.1

What's Changed

Bug Fixes

fix: Swift 6 Sendable errors with macOS 26.2 SDK (#245) by @tacshi
fix: Swift 6 concurrency errors in audio conversion (#239) by @Alex-Wengg
fix: rename CLI executable to fluidaudiocli to avoid Xcode name collision by @Alex-Wengg
fix(diarizer): use K-Means centroids when speaker count constraint is applied (#236) by @beshkenadze
Preventing loops with non-blank tokens (#244) by @Steven-Weng

New Contributors

@tacshi made their first contribution in #245

Full Changelog: v0.9.0...v0.9.1

Contributors

beshkenadze, tacshi, and 2 other contributors

Assets 2

31 Dec 20:20

Alex-Wengg

v0.9.0 - Swift 6 Support

What's New

Swift 6 Support

Full Swift 6 compatibility
Updated swift-tools-version

Full Changelog: v0.8.2...v0.9.0

Assets 2

30 Dec 23:09

Alex-Wengg

v0.8.2

What's Changed

Padding blank audio to the back of short audio files Bug Fix (#234) - @Steven-Weng
feat(tts): SSML tag phoneme, sub, and say-as support for kokoro TTS coreml (#235) - @smdesai

Full Changelog: v0.8.1...v0.8.2

Contributors

smdesai and Steven-Weng

Assets 2

26 Dec 22:29

Alex-Wengg

v0.8.1

What's New

Features

Transcription progress: Support emitting transcription progress percentage (#229) - @xinnjie
Speaker count constraints: Added minSpeakers, maxSpeakers, numSpeakers options for diarization (#220) - @beshkenadze
CLI JSON output: Added --output-json flag to transcribe command (#222) - @Steven-Weng

Bug Fixes

Fixed segmentation prewarm hang using async prediction API (#227) - @Tasktivity
Fixed content skipping bug (#223) - @Steven-Weng

New Contributors

@xinnjie made their first contribution in #229
@beshkenadze made their first contribution in #220
@Tasktivity made their first contribution in #227
@Steven-Weng made their first contribution in #222

Full Changelog: v0.8.0...v0.8.1

Contributors

beshkenadze, xinnjie, and 2 other contributors

Assets 2

17 Dec 22:28

Alex-Wengg

v0.8.0

What's New

Parakeet EOU Streaming ASR (#216)

New streaming ASR with End-of-Utterance (EOU) detection using NVIDIA's Parakeet EOU 120M model.

Features:

StreamingEouAsrManager - streaming pipeline with 160ms and 320ms chunk support
Real-time End-of-Utterance detection with configurable debounce (default 1280ms)
Native Swift NeMoMelSpectrogram with vDSP vectorization
RnntDecoder - RNN-T greedy decoder with EOU detection
Automatic model downloads from HuggingFace

CLI:

swift run fluidaudio parakeet-eou audio.wav --chunk-ms 320

Full Changelog: v0.7.12...v0.8.0

Assets 2

15 Dec 22:11

Alex-Wengg

v0.7.12

v0.7.12

Custom TTS pronunciation dictionaries — lexicon file support for Kokoro (CLI --lexicon, TtsCustomLexicon, runtime updates), multi-tier matching, and docs/tests (#213) by smdesai
Hugging Face downloads — integrate the official Swift Hugging Face SDK for model fetching with env token/proxy support and leaner registry traversal (#215)
Platform/model validation utilities — add SystemInfo.isIntelMac, AsrModels.isModelValid, streaming decoder reuse, and non-contiguous stride handling (#210)

Assets 2