A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.
This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.
Note
This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! π€
Name | GitHub | Weights | License | Fine-tune | Languages | Paper | Demo | Issues |
---|---|---|---|---|---|---|---|---|
Amphion | Repo | π€ Hub | MIT | No | Multilingual | Paper | π€ Space | |
AI4Bharat | Repo | π€ Hub | MIT | Yes | Indic | Paper | Demo | |
Bark | Repo | π€ Hub | MIT | No | Multilingual | Paper | π€ Space | |
EmotiVoice | Repo | GDrive | Apache 2.0 | Yes | ZH + EN | Not Available | Not Available | Separate GUI agreement |
Glow-TTS | Repo | GDrive | MIT | Yes | English | Paper | GH Pages | |
GPT-SoVITS | Repo | π€ Hub | MIT | Yes | Multilingual | Not Available | Not Available | |
HierSpeech++ | Repo | GDrive | MIT | No | KR + EN | Paper | π€ Space | |
IMS-Toucan | Repo | GH release | Apache 2.0 | Yes | Multilingual | Paper | π€ Space | |
MahaTTS | Repo | π€ Hub | Apache 2.0 | No | English + Indic | Not Available | Recordings, Colab | |
Matcha-TTS | Repo | GDrive | MIT | Yes | English | Paper | π€ Space | GPL-licensed phonemizer |
MetaVoice-1B | Repo | π€ Hub | Apache 2.0 | Yes | Multilingual | Not Available | π€ Space | |
Neural-HMM TTS | Repo | GitHub | MIT | Yes | English | Paper | GH Pages | |
OpenVoice | Repo | π€ Hub | CC-BY-NC 4.0 | No | ZH + EN | Paper | π€ Space | Non Commercial |
OverFlow TTS | Repo | GitHub | MIT | Yes | English | Paper | GH Pages | |
Parler TTS | Repo | π€ Hub | Apache 2.0 | Yes | English | Not Available | Not Available | |
pflowTTS | Unofficial Repo | GDrive | MIT | Yes | English | Paper | Not Available | GPL-licensed phonemizer |
Piper | Repo | π€ Hub | MIT | Yes | Multilingual | Not Available | Not Available | GPL-licensed phonemizer |
Pheme | Repo | π€ Hub | CC-BY | Yes | English | Paper | π€ Space | |
RAD-MMM | Repo | GDrive | MIT | Yes | Multilingual | Paper | Jupyter Notebook, Webpage | |
RAD-TTS | Repo | GDrive | MIT | Yes | English | Paper | GH Pages | |
Silero | Repo | GH links | CC BY-NC-SA | No | EM + DE + ES + EA | Not Available | Not Available | Non Commercial |
StyleTTS 2 | Repo | π€ Hub | MIT | Yes | English | Paper | π€ Space | GPL-licensed phonemizer |
Tacotron 2 | Unofficial Repo | GDrive | BSD-3 | Yes | English | Paper | Webpage | |
TorToiSe TTS | Repo | π€ Hub | Apache 2.0 | Yes | English | Technical report | π€ Space | |
TTTS | Repo | π€ Hub | MPL 2.0 | No | ZH | Not Available | Colab, π€ Space | |
VALL-E | Unofficial Repo | Not Available | MIT | Yes | NA | Paper | Not Available | |
VITS/ MMS-TTS | Repo | π€ Hub / MMS | Apache 2.0 | Yes | English | Paper | π€ Space | GPL-licensed phonemizer |
WhisperSpeech | Repo | π€ Hub | MIT | No | English, Polish | Not Available | π€ Space, Recordings, Colab | |
XTTS | Repo | π€ Hub | CPML | Yes | Multilingual | Paper | π€ Space | Non Commercial |
xVASynth | Repo | π€ Hub | GPL-3.0 | Yes | Multilingual | Paper | π€ Space | Copyrighted materials used for training. |
Click on this to toggle table visibility
Name | Processor β‘ |
Phonetic alphabet π€ |
Insta-clone π₯ |
Emotional control π |
Prompting π |
Speech control π |
Streaming support π |
S2S support π¦ |
Longform synthesis |
---|---|---|---|---|---|---|---|---|---|
Amphion | CUDA | π₯ | ππ₯ | β | |||||
Bark | CUDA | β | π tags | β | |||||
EmotiVoice | |||||||||
Glow-TTS | |||||||||
GPT-SoVITS | |||||||||
HierSpeech++ | β | π₯ | ππ₯ | β | speed / stability π |
π¦ | |||
IMS-Toucan | CUDA | β | β | β | β | ||||
MahaTTS | |||||||||
Matcha-TTS | IPA | β | β | β | speed / stability π |
||||
MetaVoice-1B | CUDA | π₯ | ππ₯ | β | stability / similarity π |
Yes | |||
Neural-HMM TTS | |||||||||
OpenVoice | CUDA | β | π₯ | 6-type π π‘πππ―π€«π |
β | ||||
OverFlow TTS | |||||||||
pflowTTS | |||||||||
Piper | |||||||||
Pheme | CUDA | β | π₯ | ππ₯ | β | stability π |
|||
RAD-TTS | |||||||||
Silero | |||||||||
StyleTTS 2 | CPU / CUDA | IPA | π₯ | ππ₯ | β | π | Yes | ||
Tacotron 2 | |||||||||
TorToiSe TTS | β | β | β | π | π | ||||
TTTS | CPU/CUDA | β | π₯ | ||||||
VALL-E | |||||||||
VITS/ MMS-TTS | CUDA | β | β | β | β | speed π |
|||
WhisperSpeech | CUDA | β | π₯ | ππ₯ | β | speed π |
|||
XTTS | CUDA | β | π₯ | ππ₯ | β | speed / stability π |
π | β | |
xVASynth | CPU / CUDA | ARPAbet+ | β | 4-type π π‘πππ― perβphoneme |
β | speed / pitch / energy / π π perβphoneme |
β | π¦ |
- Processor - CPU/CUDA/ROCm (single/multi used for inference; Real-time factor should be below 2.0 to qualify for CPU, though some leeway can be given if it supports audio streaming)
- Phonetic alphabet - None/IPA/ARPAbet (Phonetic transcription that allows to control pronunciation of certain words during inference)
- Insta-clone - Yes/No (Zero-shot model for quick voice clone)
- Emotional control - Yesπ/Strict (Strict, as in has no ability to go in-between states, insta-clone switch/ππ₯)
- Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, ElevenLabs docs)
- Streaming support - Yes/No (If it is possible to playback audio that is still being generated)
- Speech control - speed/pitch/ (Ability to change the pitch, duration, energy and/or emotion of generated speech)
- Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S; S2T=>T2S does not count)
Help make this list more complete. Create demos on the Hugging Face Hub and link them here :) Got any questions? Drop me a DM on Twitter @reach_vb.