-
Notifications
You must be signed in to change notification settings - Fork 39
Closed
Description
Hello! first of all, thanks for this wonderful repo. I would just like to ask as how to reconstruct the mel spectrogram i generated from librosa? I can do this via VQGAN using this code:
def reconstruct_with_vqgan(x, model):
z, _, [_, _, indices] = model.encode(x)
xrec = model.decode(z)
return xrec
the xrec is the reconstructed image (from VQGAN)
I also add a preprocessing step before reconstructing using this code (same one from DALL-E's VQVAE):
def preprocess(img):
s = min(img.size)
if s < target_image_size:
raise ValueError(f'min dim for image {s} < {target_image_size}')
r = target_image_size / s
s = (round(r * img.size[1]), round(r * img.size[0]))
img = TF.resize(img, s, interpolation=PIL.Image.LANCZOS)
#img = TF.center_crop(img, output_size=2 * [target_image_size])
img = torch.unsqueeze(T.ToTensor()(img), 0)
return img
in the end i just call these 2 functions to reconstruct the image
img = PIL.Image.open(image).convert("RGB") # input is the mel spectogram in image form
x_vqgan = preprocess(img)
x_vqgan = x_vqgan.to(DEVICE)
x2 = reconstruct_with_vqgan(x_vqgan, model32x32) # model32x32 is the VQGAN model
x2 = custom_to_pil(x2[0]) # final reconstructed image
I was wondering how I could use your model instead to reconstruct in a way that it is similar to this. I just checked the demo and saw that it extracts audio from the video. I'm thinking as to how I can directly reconstruct the mel spectrogram generated on librosa.
Thank you very much in advance :D
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels