Skip to content

aiola-lab/drax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Drax: Speech Recognition with Discrete Flow Matching

Introduction

📄 Paper | 🤗 Models

This repository contains the official implementation for Drax: Speech Recognition with Discrete Flow Matching.

DRAX architecture Discrete flow sampling

Installation

Recommended:

conda create -n drax python=3.10 -y
conda activate drax
# then install as above
pip install -U pip
pip install torch==2.7.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -e .

If you don't already have Torch installed, run:

pip install -e .[with-torch]

Quickstart: Transcribe

Using the generate cli:

python generate.py --model_path aiola/drax-v1 --audio_path path/to/audio --language <langauge-code>

Or using the Transcriber:

from drax import Transcriber

asr = Transcriber(model_path="aiola/drax-v1")  # HF repo or local path
result = asr.transcribe("/path/to/audio.wav", language="en")
print(result[0].transcript)

Controlling sampling steps, temperature, etc.

result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2)
print(result[0].transcript)

Batch inference

audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"]
languages = ["en", "de"]
result = asr.transcribe(audio_paths, language=languages)
print(result.transcript)

Dependencies

Core deps are installed via pyproject.toml (including transformers==4.52.3). Torch/torchaudio are installed via the [with-torch] extra or separately for your CUDA.

Development

Enable pre-commit hooks (Ruff):

pip install pre-commit ruff
pre-commit install
# Run manually
pre-commit run -a

References

Acknowledgements

This project borrows components from:

License

The majority of the code is licensed under CC-BY-NC; portions may be available under separate terms (BSD/MIT) as noted in the referenced projects.

Citation

@article{navon2025drax,
  title={Drax: Speech Recognition with Discrete Flow Matching},
  author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan},
  journal={arXiv preprint arXiv:2510.04162},
  year={2025}
}

About

Drax: Speech Recognition with Discrete Flow Matching

Topics

Resources

License

Stars

Watchers

Forks

Languages