You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I read the paper, we need a total of 3 data for the inference. I understood that we need three types: voice data for prompt, text data for prompt, and text data for synthesis target.
We need to 'concatenate' text data for prompt and text data for synthesis target and use it for inference. But vall-e/main.py file, there is no input for text data for prompt. I think this will affect the performance degradation, but I'm wondering if I'm misunderstanding or if it's implemented in a different location.
I'd appreciate it if you could answer.
The text was updated successfully, but these errors were encountered:
As far as I read the paper, we need a total of 3 data for the inference. I understood that we need three types: voice data for prompt, text data for prompt, and text data for synthesis target.
We need to 'concatenate' text data for prompt and text data for synthesis target and use it for inference. But vall-e/main.py file, there is no input for text data for prompt. I think this will affect the performance degradation, but I'm wondering if I'm misunderstanding or if it's implemented in a different location.
I'd appreciate it if you could answer.
The text was updated successfully, but these errors were encountered: