This repository contains the official implementation for Drax: Speech Recognition with Discrete Flow Matching.
Recommended:
conda create -n drax python=3.10 -y
conda activate drax
# then install as above
pip install -U pip
pip install torch==2.7.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -e .If you don't already have Torch installed, run:
pip install -e .[with-torch]Using the generate cli:
python generate.py --model_path aiola/drax-v1 --audio_path path/to/audio --language <langauge-code>Or using the Transcriber:
from drax import Transcriber
asr = Transcriber(model_path="aiola/drax-v1") # HF repo or local path
result = asr.transcribe("/path/to/audio.wav", language="en")
print(result[0].transcript)Controlling sampling steps, temperature, etc.
result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2)
print(result[0].transcript)Batch inference
audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"]
languages = ["en", "de"]
result = asr.transcribe(audio_paths, language=languages)
print(result.transcript)Core deps are installed via pyproject.toml (including transformers==4.52.3).
Torch/torchaudio are installed via the [with-torch] extra or separately for your CUDA.
Enable pre-commit hooks (Ruff):
pip install pre-commit ruff
pre-commit install
# Run manually
pre-commit run -a- Drax: Speech Recognition with Discrete Flow Matching: https://arxiv.org/abs/2510.04162
- Discrete Flow Matching: https://arxiv.org/abs/2407.15595
- Flow Matching with General Discrete Paths: https://arxiv.org/abs/2412.03487
- Generative Flows on Discrete State-Spaces: https://arxiv.org/abs/2402.04997
- Simplified and Generalized Masked Diffusion for Discrete Data: https://arxiv.org/abs/2406.04329
This project borrows components from:
- Flow-matching: https://github.com/facebookresearch/flow_matching
- Flash attention: https://github.com/Dao-AILab/flash-attention
- Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution: https://github.com/louaaron/Score-Entropy-Discrete-Diffusion
- GLIDE: https://github.com/openai/glide-text2im/
The majority of the code is licensed under CC-BY-NC; portions may be available under separate terms (BSD/MIT) as noted in the referenced projects.
@article{navon2025drax,
title={Drax: Speech Recognition with Discrete Flow Matching},
author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan},
journal={arXiv preprint arXiv:2510.04162},
year={2025}
}
