Reconstruct mel spectrogram from librosa

Hello! first of all, thanks for this wonderful repo. I would just like to ask as how to reconstruct the mel spectrogram i generated from librosa? I can do this via VQGAN using this code:
```
def reconstruct_with_vqgan(x, model):
  z, _, [_, _, indices] = model.encode(x)
  xrec = model.decode(z)
  return xrec
```
the xrec is the reconstructed image (from VQGAN)

I also add a preprocessing step before reconstructing using this code (same one from DALL-E's VQVAE):
```
def preprocess(img): 
    s = min(img.size)
    
     if s < target_image_size:
        raise ValueError(f'min dim for image {s} < {target_image_size}')
        
    r = target_image_size / s
    s = (round(r * img.size[1]), round(r * img.size[0]))
    img = TF.resize(img, s, interpolation=PIL.Image.LANCZOS)
    #img = TF.center_crop(img, output_size=2 * [target_image_size])
    img = torch.unsqueeze(T.ToTensor()(img), 0)
    return img
```
in the end i just call these 2 functions to reconstruct the image
```
img = PIL.Image.open(image).convert("RGB") # input is the mel spectogram in image form
x_vqgan = preprocess(img)
x_vqgan = x_vqgan.to(DEVICE)
  
x2 = reconstruct_with_vqgan(x_vqgan, model32x32) # model32x32 is the VQGAN model
x2 = custom_to_pil(x2[0]) # final reconstructed image 
```

I was wondering how I could use your model instead to reconstruct in a way that it is similar to this. I just checked the demo and saw that it extracts audio from the video. I'm thinking as to how I can directly reconstruct the mel spectrogram generated on librosa. 


Thank you very much in advance :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstruct mel spectrogram from librosa #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reconstruct mel spectrogram from librosa #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions