This codebase is for the following paper with a corresponding demo page.
Supplementary material for the theoretical results and proofs is also avaliable.
Pretrained model weights and computed Encodec mean and std tensors are available here.
An example of running inference for diffusion brindges can be found in inference.ipynb.
(Optional) Create virtual environment and activate it
conda create --name diff python=3.10
conda activate diffInstall requirements
pip install -r requirements.txtGit clone the repository basic-pitch-torch to the audio_diffusion_pytorch directory of the project
git clone https://github.com/gudgud96 basic-pitch-torchIn the exp folder, you will find several subfolders that contain configuration files used across all experiments. These files define general settings like training, model parameters, and logging, which are independent of any specific experiment. Below is a brief overview of these folders:
- callbacks: Contains configurations for various callbacks, such as model checkpoints, model summaries, and loggers.
- datamodule: Defines data-related configurations, including the validation split and which dataset class to use. By default, it uses
audio_data_pytorch.WaVDatasetto work with.wavfiles, but you can create and specify a custom dataset here. Data transforms are also specified in this folder. - loggers: By default, TensorBoard is used for logging, but you can add or customize the logger configuration here.
- model: Contains the model configuration files, defining the model architecture and parameters.
- trainer: Defines training configurations, such as GPUs to use, precision settings, number of epochs, etc.
To create a new experiment, you need to add a new .yaml file in the exp folder where you specify the experimental settings. These settings can include parameters like the instruments used, sigma_min, sigma_max, etc. Alternatively, you can use existing configurations, such as flute_latent.yaml, to train specific models, or modify them according to your needs.
Note, before using an existing .yaml config, change mean_path and std_path to the dir where you have the right mean and std tensors.
When specifying the dataset_path, ensure it points to a directory that contains .wav files. The files should have the instrument name included in the filename (e.g., flute1.wav). The system will recursively search through all subdirectories for such .wav files based on the instrument name.
python train.py \
exp=name_of_your_yaml_file_in_exp_folder \
trainer.gpus=1 \
model.lr=1e-4 \
trainer.precision=32 \
trainer.max_epochs=500 \
datamodule.batch_size=32 \
datamodule.num_workers=16 \
+dataset_path=path/to/your/data \
exp_tag=name_of_expIf you want to resume training from a previous checkpoint, uncomment the ckpt: line in the corresponding config.yaml file and provide the path to the .ckpt file. By default, checkpoints generated during training will be saved in the logs/name_of_exp directory.
By default, a log directory is created in the root folder, and for each experiment, a subfolder with the experiment's name is generated. During training, checkpoints are automatically saved in this subfolder.