Skip to content

Releases: FluidInference/FluidAudio

v0.12.0 PocketTTS

03 Feb 05:09
9fcdf2f

Choose a tag to compare

New TTS model converted from the Pocket TTS

  • simpler chunking algorithm while supporting longer token counts
  • EOS detection stops generation naturally instead of hitting a fixed token wall
  • real time support
  • No espeak dependency
  • iOS RAM friendly

Bug Fixes

  • #282 — Custom vocabulary rescoring now applies to chunked long audio (was silently skipped on >15s
    audio) — thanks @Beingpax

Note

  • we plan to deprecate kokoro tts in the future soon

v0.11.0 Custom Vocabulary & Pipeline Updates

31 Jan 02:40
5d9176e

Choose a tag to compare

What’s New

  • Custom Vocabulary Support (#251): Major feature for custom vocab with recognized domain terms
  • Vocabulary Pipeline Refactor (#276): Restructure, dead code removal, and pure Swift dataset
  • download Float16 Xcode GUI Build Failure (#270): Add architecture checks to avoid build break by @schmatz
  • Documentation Refresh (#280): Reorganize docs and remove stale content README Updates: Link and description cleanup

Full Changelog: v0.10.1...v0.11.0

v0.10.1: Streaming & De-esser

28 Jan 19:07
0afbabc

Choose a tag to compare

What's New

  • Streaming Audio Processing (#257): Memory-efficient transcription for large files — 99.5% reduction (230MB → ~1.2MB constant)
  • TTS De-esser (#267): Reduces harsh sibilant sounds in Kokoro TTS output, on by default

Bug Fixes

  • macOS 26 Sortformer compatibility (#266): Switch to V2 models to fix BNNS compiler error
  • Chunk boundary transcription loss (#264): Fix speech truncation by prepending mel context to non-first chunks
  • Legacy FileHandle.write (#262): Replace deprecated API with throwing write(contentsOf:)

New Contributors

Full Changelog: v0.10.0...v0.10.1

v0.10.0: Sortformer

12 Jan 01:10

Choose a tag to compare

Sortformer: Real-Time Speaker Diarization

CoreML version of Nividia's Sortformer

  • Real-time streaming - Speaker labels as audio comes in
  • Noisy environment support - Works where traditional pipelines fail
  • Overlapping speech - Scores all 4 speakers independently per frame, multiple can be active simultaneously
  • Single neural model - No complex pipeline, just one model

Credit to @SGD2718 for the Sortformer implementation & model conversion.

v0.9.1

03 Jan 19:09
915a392

Choose a tag to compare

What's Changed

Bug Fixes

  • fix: Swift 6 Sendable errors with macOS 26.2 SDK (#245) by @tacshi
  • fix: Swift 6 concurrency errors in audio conversion (#239) by @Alex-Wengg
  • fix: rename CLI executable to fluidaudiocli to avoid Xcode name collision by @Alex-Wengg
  • fix(diarizer): use K-Means centroids when speaker count constraint is applied (#236) by @beshkenadze
  • Preventing loops with non-blank tokens (#244) by @Steven-Weng

New Contributors

Full Changelog: v0.9.0...v0.9.1

v0.9.0 - Swift 6 Support

31 Dec 20:20
73fb84a

Choose a tag to compare

What's New

Swift 6 Support

  • Full Swift 6 compatibility
  • Updated swift-tools-version

Full Changelog: v0.8.2...v0.9.0

v0.8.2

30 Dec 23:09
82a17b9

Choose a tag to compare

What's Changed

  • Padding blank audio to the back of short audio files Bug Fix (#234) - @Steven-Weng
  • feat(tts): SSML tag phoneme, sub, and say-as support for kokoro TTS coreml (#235) - @smdesai

Full Changelog: v0.8.1...v0.8.2

v0.8.1

26 Dec 22:29
47552dd

Choose a tag to compare

What's New

Features

  • Transcription progress: Support emitting transcription progress percentage (#229) - @xinnjie
  • Speaker count constraints: Added minSpeakers, maxSpeakers, numSpeakers options for diarization (#220) - @beshkenadze
  • CLI JSON output: Added --output-json flag to transcribe command (#222) - @Steven-Weng

Bug Fixes

New Contributors

Full Changelog: v0.8.0...v0.8.1

v0.8.0

17 Dec 22:28
892da4f

Choose a tag to compare

What's New

Parakeet EOU Streaming ASR (#216)

New streaming ASR with End-of-Utterance (EOU) detection using NVIDIA's Parakeet EOU 120M model.

Features:

  • StreamingEouAsrManager - streaming pipeline with 160ms and 320ms chunk support
  • Real-time End-of-Utterance detection with configurable debounce (default 1280ms)
  • Native Swift NeMoMelSpectrogram with vDSP vectorization
  • RnntDecoder - RNN-T greedy decoder with EOU detection
  • Automatic model downloads from HuggingFace

CLI:

swift run fluidaudio parakeet-eou audio.wav --chunk-ms 320

Full Changelog: v0.7.12...v0.8.0

v0.7.12

15 Dec 22:11
ddee663

Choose a tag to compare

v0.7.12

  • Custom TTS pronunciation dictionaries — lexicon file support for Kokoro (CLI --lexicon, TtsCustomLexicon, runtime updates), multi-tier matching, and docs/tests (#213) by smdesai
  • Hugging Face downloads — integrate the official Swift Hugging Face SDK for model fetching with env token/proxy support and leaner registry traversal (#215)
  • Platform/model validation utilities — add SystemInfo.isIntelMac, AsrModels.isModelValid, streaming decoder reuse, and non-contiguous stride handling (#210)