-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Hi @lucadellalib,
Thanks for best VLBR codec! 👍
I try switch this codec into Stream mode for realtime using: feed fixed portions of audio-samples. But output stream made very joke and laugth, because there is no strict syncronization in tokens level.
How to proper use this codec in stream mode?
Below some code of my tests:
#...............
#some import from modules...
#...............
coder=WavLM()
decoder=Vocos()
compressor=FocalEncoder()
decompressor=FocalDecoder()
quantizer=BinarySphericalQuantizer()
#...............
#then load models data: state dicts...
#...............
coder.eval().requires_grad_(False)
decoder.eval().requires_grad_(False)
compressor.eval().requires_grad_(False)
decompressor.eval().requires_grad_(False)
quantizer.eval().requires_grad_(False)
sig,sample_rate=torchaudio.load("input.wav")
L=1280*8 #16
N=len(sig[0,:])//L
dec=np.array(range(L*N)).reshape((1,L*N)).astype('float32')
dec=torch.from_numpy(dec)
for n in range(N):
print("\n")
print(n)
Frame=sig[:,n*L:(n+1)*L]
print(Frame.shape)
feats=coder(Frame)
print(feats.shape)
lats=compressor(feats)
lats=nn.functional.normalize(lats,dim=-1)
print(lats.shape)
codes=quantizer.lats_to_codes(lats)
print(codes.shape)
toks=quantizer.codes_to_toks(codes)
print(toks.shape)
print(toks)
#
codes=quantizer.toks_to_codes(toks)
print(codes.shape)
qfeats=decompressor(codes)
print(qfeats.shape)
dsig=decoder(qfeats)
print(dsig.shape)
#dec[:,n*L:(n+1)*L]=dsig
dec[:,n*L:(n+1)*L]=dsig.detach() #.clone()
print(dec[:,n*L:(n+1)*L].shape)
torchaudio.save("reconstruction.wav",dec,sample_rate)
Metadata
Metadata
Assignees
Labels
No labels