Experimental on-device Whisper (encoder + greedy decoder) implemented with MLX in Swift.
Status: Pre-0.1.0 – API & accuracy not stable. Tokenizer simplified. Timestamps & advanced decoding not yet implemented.
- Manifest-driven weight loader (Whisper encoder + decoder tensors)
- Simplified tokenizer with special tokens
- Encoder forward pass (transformer blocks)
- Greedy decoder (no timestamps, no beam search yet)
- Full BPE merges + byte fallback
- Timestamp token handling & segmentation
- Beam search / temperature / top-p sampling
- Incremental streaming with KV cache reuse
- Quantization (int8 / mixed precision)
- Weight download & checksum verification
- Benchmarks & memory profiling
import SwiftWhisperKitMLX
let transcriber = MLXWhisperTranscriber()
try await transcriber.prepare()
let result = try await transcriber.transcribe(samples: audioSamples, sampleRate: 44_100)
print("Transcribed:", result.text)Expected directory (first found among Application Support & Documents paths):
~/Library/Application Support/MLXModels/<variant>/
manifest.json
vocab.json
tokenizer.vocab.json (optional)
tokenizer.merges.txt (optional)
*.bin (tensor shards referenced in manifest)
You can symlink or copy model folders produced by your conversion pipeline. A helper script now exists to fetch weights.
Use the helper script to fetch a hosted model variant (example variant small) where the layout is <base>/<variant>/manifest.json and related tensor shard files.
./Scripts/download-weights.sh small https://your-host.example/whisperOutputs are placed in ~/Library/Application Support/<variant> on macOS.
GitHub Actions (macOS 14) builds & tests on pushes and pull requests. See badge above.
WhisperTranscriber is deprecated; use MLXWhisperTranscriber. A typealias is provided to avoid breaking existing code.
- Tokenizer does not yet perform full Whisper BPE merge logic.
- No timestamp tokens or alignment; returns a single text string.
- Greedy decoding only – may produce suboptimal text.
See CONTRIBUTING.md for guidelines. PRs improving tokenizer fidelity, timestamps, or download manager are welcome.
MIT – see LICENSE.
This is experimental and not production-quality. Validate outputs against a reference implementation before relying on accuracy.