-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Hi @lucadellalib,
Thanks for best VLBR codec! 👍
I try switch this codec into Stream mode for realtime using: feed fixed portions of audio-samples. But output stream made very joke and laugth, because there is no strict syncronization in tokens level.
How to proper use this codec in stream mode?
Below some code of my tests:
#...............
#some import from modules...
#...............
coder=WavLM()
decoder=Vocos()
compressor=FocalEncoder()
decompressor=FocalDecoder()
quantizer=BinarySphericalQuantizer()
#...............
#then load models data: state dicts...
#...............
coder.eval().requires_grad_(False)
decoder.eval().requires_grad_(False)
compressor.eval().requires_grad_(False)
decompressor.eval().requires_grad_(False)
quantizer.eval().requires_grad_(False)
sig,sample_rate=torchaudio.load("input.wav")
L=1280*8 #16
N=len(sig[0,:])//L
dec=np.array(range(L*N)).reshape((1,L*N)).astype('float32')
dec=torch.from_numpy(dec)
for n in range(N):
print("\n")
print(n)
Frame=sig[:,n*L:(n+1)*L]
print(Frame.shape)
feats=coder(Frame)
print(feats.shape)
lats=compressor(feats)
lats=nn.functional.normalize(lats,dim=-1)
print(lats.shape)
codes=quantizer.lats_to_codes(lats)
print(codes.shape)
toks=quantizer.codes_to_toks(codes)
print(toks.shape)
print(toks)
#
codes=quantizer.toks_to_codes(toks)
print(codes.shape)
qfeats=decompressor(codes)
print(qfeats.shape)
dsig=decoder(qfeats)
print(dsig.shape)
#dec[:,n*L:(n+1)*L]=dsig
dec[:,n*L:(n+1)*L]=dsig.detach() #.clone()
print(dec[:,n*L:(n+1)*L].shape)
torchaudio.save("reconstruction.wav",dec,sample_rate)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels