Skip to content

Stability-AI/stable-codec-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

⚠️ Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome.

arXiv: Stable Audio Open paper

HuggingFace: model weights

stable-audio-tools: code to reproduce Stable Audio

stable-audio-metrics: code to evaluate Stable Audio

Stable Audio Open generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder.

Autoencoder reconstructions

This comparison is useful to evaluate the audio fidelity capabilities of the autoencoder. On the left, we have the ground truth recording. On the right, we take the ground truth recording and end pass it through the any of those autoencoders or neural audio codecs.

Ground truth Stable Audio Open Stable Audio 2.0 DAC
Audio not supported by your browser. Audio not supported by your browser. Audio not supported by your browser. Audio not supported by your browser.
Audio not supported by your browser. Audio not supported by your browser. Audio not supported by your browser. Audio not supported by your browser.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published