GitHub - Stability-AI/stable-codec-demo

⚠️ Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome.

arXiv: Stable Audio Open paper

HuggingFace: model weights

stable-audio-tools: code to reproduce Stable Audio

stable-audio-metrics: code to evaluate Stable Audio

Stable Audio Open generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder.

Autoencoder reconstructions

This comparison is useful to evaluate the audio fidelity capabilities of the autoencoder. On the left, we have the ground truth recording. On the right, we take the ground truth recording and end pass it through the any of those autoencoders or neural audio codecs.

Ground truth	Stable Audio Open	Stable Audio 2.0	DAC
Audio not supported by your browser.	Audio not supported by your browser.	Audio not supported by your browser.	Audio not supported by your browser.
Audio not supported by your browser.	Audio not supported by your browser.	Audio not supported by your browser.	Audio not supported by your browser.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
audio		audio
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoencoder reconstructions

About

Releases

Packages

Stability-AI/stable-codec-demo

Folders and files

Latest commit

History

Repository files navigation

Autoencoder reconstructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages