You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I really appreciate your work; the demo sounds great.
I also read papers about PPG-based VC, which uses ASR for PPG extraction. I just wonder about the difference between SSL and PPG-based methods. It seems they both extract some information about linguistics. Have you ever compared them?
Thank you!
The text was updated successfully, but these errors were encountered:
There are some definite similarities between PPGs and the Soft Speech Units we proposed. The main difference is that soft units don't require text transcriptions to train. This can be useful for training VC systems in languages without large corpora of annotated speech. Additionally, things like laughter, breathing, etc. may be captured better by soft units than PPGs. Unfortunately, I haven't compared the approaches directly yet. I think it would be a useful benchmark but haven't had the chance to look into it.
Hi, I really appreciate your work; the demo sounds great.
I also read papers about PPG-based VC, which uses ASR for PPG extraction. I just wonder about the difference between SSL and PPG-based methods. It seems they both extract some information about linguistics. Have you ever compared them?
Thank you!
The text was updated successfully, but these errors were encountered: