Drax: Speech Recognition with Discrete Flow Matching

Introduction

This repository contains the official implementation for Drax: Speech Recognition with Discrete Flow Matching.

Installation

Recommended:

conda create -n drax python=3.10 -y
conda activate drax
# then install as above
pip install -U pip
pip install torch==2.7.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -e .

If you don't already have Torch installed, run:

pip install -e .[with-torch]

Quickstart: Transcribe

Using the generate cli:

python generate.py --model_path aiola/drax-v1 --audio_path path/to/audio --language <langauge-code>

Or using the Transcriber:

from drax import Transcriber

asr = Transcriber(model_path="aiola/drax-v1")  # HF repo or local path
result = asr.transcribe("/path/to/audio.wav", language="en")
print(result[0].transcript)

Controlling sampling steps, temperature, etc.

result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2)
print(result[0].transcript)

Batch inference

audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"]
languages = ["en", "de"]
result = asr.transcribe(audio_paths, language=languages)
print(result.transcript)

Dependencies

Core deps are installed via pyproject.toml (including transformers==4.52.3). Torch/torchaudio are installed via the [with-torch] extra or separately for your CUDA.

Development

Enable pre-commit hooks (Ruff):

pip install pre-commit ruff
pre-commit install
# Run manually
pre-commit run -a

References

Drax: Speech Recognition with Discrete Flow Matching: https://arxiv.org/abs/2510.04162
Discrete Flow Matching: https://arxiv.org/abs/2407.15595
Flow Matching with General Discrete Paths: https://arxiv.org/abs/2412.03487
Generative Flows on Discrete State-Spaces: https://arxiv.org/abs/2402.04997
Simplified and Generalized Masked Diffusion for Discrete Data: https://arxiv.org/abs/2406.04329

Acknowledgements

This project borrows components from:

Flow-matching: https://github.com/facebookresearch/flow_matching
Flash attention: https://github.com/Dao-AILab/flash-attention
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution: https://github.com/louaaron/Score-Entropy-Discrete-Diffusion
GLIDE: https://github.com/openai/glide-text2im/

License

The majority of the code is licensed under CC-BY-NC; portions may be available under separate terms (BSD/MIT) as noted in the referenced projects.

Citation

@article{navon2025drax,
  title={Drax: Speech Recognition with Discrete Flow Matching},
  author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan},
  journal={arXiv preprint arXiv:2510.04162},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
drax		drax
misc		misc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drax: Speech Recognition with Discrete Flow Matching

Introduction

Installation

Quickstart: Transcribe

Dependencies

Development

References

Acknowledgements

License

Citation

About

Uh oh!

Uh oh!

Contributors 2

Languages

License

aiola-lab/drax

Folders and files

Latest commit

History

Repository files navigation

Drax: Speech Recognition with Discrete Flow Matching

Introduction

Installation

Quickstart: Transcribe

Dependencies

Development

References

Acknowledgements

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Languages