Skip to content

Reconstruct mel spectrogram from librosa #11

@clairerity

Description

@clairerity

Hello! first of all, thanks for this wonderful repo. I would just like to ask as how to reconstruct the mel spectrogram i generated from librosa? I can do this via VQGAN using this code:

def reconstruct_with_vqgan(x, model):
  z, _, [_, _, indices] = model.encode(x)
  xrec = model.decode(z)
  return xrec

the xrec is the reconstructed image (from VQGAN)

I also add a preprocessing step before reconstructing using this code (same one from DALL-E's VQVAE):

def preprocess(img): 
    s = min(img.size)
    
     if s < target_image_size:
        raise ValueError(f'min dim for image {s} < {target_image_size}')
        
    r = target_image_size / s
    s = (round(r * img.size[1]), round(r * img.size[0]))
    img = TF.resize(img, s, interpolation=PIL.Image.LANCZOS)
    #img = TF.center_crop(img, output_size=2 * [target_image_size])
    img = torch.unsqueeze(T.ToTensor()(img), 0)
    return img

in the end i just call these 2 functions to reconstruct the image

img = PIL.Image.open(image).convert("RGB") # input is the mel spectogram in image form
x_vqgan = preprocess(img)
x_vqgan = x_vqgan.to(DEVICE)
  
x2 = reconstruct_with_vqgan(x_vqgan, model32x32) # model32x32 is the VQGAN model
x2 = custom_to_pil(x2[0]) # final reconstructed image 

I was wondering how I could use your model instead to reconstruct in a way that it is similar to this. I just checked the demo and saw that it extracts audio from the video. I'm thinking as to how I can directly reconstruct the mel spectrogram generated on librosa.

Thank you very much in advance :D

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions