Skip to content

maxmelichov/Text-To-speech

Repository files navigation

Text-To-Speech

Welcome to the Robo-Shaul repository! Here, you'll find everything you need to train your own Robo-Shaul or use pre-trained models. Robo-Shaul is a text-to-speech system that converts Hebrew text into speech using Tacotron 2 TTS as a framework

Although the model that won the competition had a training duration of only 5k steps.

A subsequent model was developed after the competition deadline. This advanced model underwent an extensive training process of 90k steps, utilizing an enhanced training methodology that incorporated a broader spectrum of extreme cases. These novel training techniques were absent in the previous model, providing the later model with a significant advantage in terms of its capabilities and performance.

For a demo look here

For a quick start look at Notebook or Open In Colab Open In Colab

For the חיות כיס podcast documenting the project listen here

Site for the project link

The system consists of the SASPEECH dataset, which is a collection of recordings of Shaul Amsterdamski's unedited recordings for the podcast 'Hayot Kis', and a Text-to-Speech system trained on the dataset, implemented in the Tacotron 2 by Nvidia AI TTS framework.

To download the dataset for training, go to link

To download the trained models, go to model with 90K steps, model with 5K steps

The model expects diacritized Hebrew (עברית מנוקדת), we recommend Nakdimon by Elazar Gershuni and Yuval Pinter. The link is to a free online tool, the code and model are also available on GitHub at https://github.com/elazarg/nakdimon

Data Creation

For a quick start look at Notebook

How to use the training notebook and the synthesis notebook

These videos will help you to gather the data and also train the model: Part1,Part2

We're using the custom Tacotron 2 that we took from Nvidia and custom notebooks.

Information about HebrewToEnglish.py

We implemented several functions that deal with processing and converting Hebrew text into English sounds. It includes functions for breaking down numbers into Hebrew words, converting Hebrew letters into their corresponding English sounds, and converting entire Hebrew sentences into English sounds. The code also includes functions for handling numbers, punctuation marks, and special cases within the Hebrew text.

What can be done to make this model even more robust:

  1. Use the Hebrew package to create a set of all the possible Hebrew letters with Nikod in UNICODE-8.
  2. Change Tacotron's 2 input letters to the set that you created in Step 1.
  3. Create a new transcript algorithm that can convert Hebrew with Nikod to UNICODE-8.

Contact Us

We are Maxim Melichov and Tony Hasson. If you have any questions or comments, please feel free to contact us using the information below.

Maxim Melichov Tony Hasson
Connect on LinkedIn Connect on LinkedIn