Skip to content

Improving Audio Explanations using Audio Language Models

License

Notifications You must be signed in to change notification settings

glam-imperial/AudioXLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioXLM

Repository for Improving Audio Explanations using Audio Language Models

Contents:

audio_explanation_generation --> Inference script for generating audio explanations using AudioXLM.
fidelity_performance --> Script to measure the fidelity score of explanations on the Speech Commands and TESS datasets.
ASR_WER_performance_TESS --> Script for evaluating automatic speech recognition performance of AudioXLM explanations in the speech emotion recognition task.
encode_dataset_AudioGen_SC --> Script for encoding datasets into the embedding space of AudioGen for classifier models and AudioXLM, ensuring representation consistency.
AudioGen_update --> Scripts for modifying the original AudioGen library. After installing AudioGen, copy these scripts into the appropriate path in AudioGen library.
models --> Folder containing classification models that predict on encoded datasets.
sample_explanations --> Sample audio explanations generated by AudioXLM.

AudioGen Installation

Follow the AudioGen installation instructions from the AudioCraft repository.

AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
python -m pip install setuptools wheel
# Then proceed to one of the following
python -m pip install -U audiocraft  # stable release
python -m pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
python -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).
python -m pip install -e '.[wm]'  # if you want to train a watermarking model

We also recommend having ffmpeg installed, either through your system or Anaconda:

sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Citation

The citation will be provided upon publication.

About

Improving Audio Explanations using Audio Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages