Text-To-Speech-

VITS and XTTS Demo can be found in this link :https://huggingface.co/spaces/saillab/ZabanZad_PoC

Prerequisities:

Python >= 3.9
Espeak-NG : sudo apt install -y espeak-ng
TTS (from the repo): pip install -U pip setuptools wheel git clone https://github.com/coqui-ai/TTS pip install -e TTS/

Setup Environment

init.sh--> add "sudo apt update && sudo apt upgrade -y

sudo apt install -y python3.10 python3.10-dev python3.10-venv /usr/bin/python3.10 -m venv /opt/python/envs/py310 /opt/python/envs/py310/bin/python -m pip install -U pip setuptools wheel /opt/python/envs/py310/bin/python -m pip install -U ipykernel ipython ipython_genutils jedi lets-plot aiohttp pandas

sudo apt install -y espeak-ng"

in attached data --> on file environment.yml -->change datalore-base-env:"minimal" to "py310"
background computation --> Never Trun off
git clone https://github.com/coqui-ai/TTS.git
navigate to TTS (cd TTS)
pip install -e.

Run Multi-GPU

CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py
For avoiding any intruption over your training use trainer = Trainer( TrainerArgs(use_accelerate=True), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples, ) trainer.fit()
Faster train with more than 1 num_loader_workers=4 in advance you should do sudo mount -o remount,size=8G /dev/shm

Run with one GPU

!nvidia-smi(status of GPU)
os.environ["CUDA_VISIBLE_DEVICES"] = "7" which GPU you intend to run your code

How to fix Error over runtime

Error :"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 9.77 GiB of which 52.31 MiB is free. Including non-PyTorch memory, this process has 8.68 GiB memory in use. Of the allocated memory 8.25 GiB is allocated by PyTorch, and 155.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" Solution : reduce batch_size
Error: Can't find some wavs although they are existed Solution : your wavs might be nested in files and cannot find nested files
if you use common-voice as your formatter your wavs must be store in clips
Error :dimension Solution : mix precesion =false

Tensorboard :

For tensorboard download the latest output and un zip it go to that file run this command on window shell go cd to the address you stored the file : tensorboard --logdir=. --bind_all --port=6007 --> url open in your browser
Use wandb -->import wandb Start a wandb run with sync_tensorboard=True if wandb.run is None: wandb.init(project="persian-tts-vits-grapheme-cv15-fa-male-native-multispeaker-RERUN", group="GPUx8 accel mixed bf16 128x32", sync_tensorboard=True)

For Multi-Speaker

use_speaker_embedding=True
speaker_manager = SpeakerManager() speaker_manager.set_ids_from_data(train_samples + eval_samples, parse_key="speaker_name") config.num_speakers = speaker_manager.num_speakers
model = Vits(config, ap, tokenizer, speaker_manager=speaker_manager)
to run with multi-GPU : CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py

Push log to hub

Navigate to stored log
use python code titled "push_to_hub

experiment

run multi-speaker and then go to single speaker it removed the capability of model to be multi- speaker . for load your previous checkpoint navigate to the place that you want to resume your training and on that .py file add "model.load_checkpoint(config, 'best_model_495.pth', eval=False)" and load your modle from the checkpoint that you wish to restart I started an experiment dated 9 Nov cv--> azure_male-->azure_female and it seams that although it is becoming single speaker you don't need to change model configuration and it will keep running

XTTS V2

follow this link coqui-ai/TTS#3229 but still got error

new virtual environment PLUS system pip packages. as per: https://lambdalabs.com/lambda-stack-deep-learning-software

python -m venv name-of-venv source name-of-venv/bin/activate source myenv/bin/activate
1. It's always good to update Pip and Setuptools and Wheel: pip install pip setuptools wheel -U
clone Coqui TTS and install, as per: https://tts.readthedocs.io/en/latest/tutorial_for_nervous_beginners.html

$ git clone https://github.com/coqui-ai/TTS $ cd TTS $ pip install -e .
verify installation of TTS with pip list
1. again, verify that TTS actually works with tts -h to bring up its Help file.
test synthesizing per: https://github.com/coqui-ai/TTS#single-speaker-models

Run TTS with default models:

$ tts --text "Text for T,T,S, and Sailing on AI." --out_path speech.wav
this error "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 9.77 GiB of which 52.31 MiB is free. Including non-PyTorch memory, this process has 8.68 GiB memory in use. Of the allocated memory 8.25 GiB is allocated by PyTorch, and 155.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" reduce batch_size 6.CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py CUDA_VISIBLE_DEVICES=0 python train.py --continue_path path/to/previous/run/folder/ CUDA_VISIBLE_DEVICES=0,1,2 python -m trainer.distribute --script train.py 7.tensorboard --logdir=. --bind_all --port=8080 CUDA_VISIBLE_DEVICES="0,1," accelerate launch --multi_gpu --num_processes 2 --script multi-speaker.py --continue_path /home/bargh1/TTS/runs/persian-tts-vits-grapheme-cv15-multispeaker-RERUN-October-24-2023_05+57PM-1e152692/runs/persian-tts-vits-grapheme-cv15-Arman-6nov-November-07-2023_10+09AM-1e152692

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Bangla_TTS		Bangla_TTS
dataset		dataset
recipes		recipes
NLP_Proposal_for_Evaluating_and_Enhancing_Persian_and_Bengali_Text_to_Speech__TTS__Technology (2).pdf		NLP_Proposal_for_Evaluating_and_Enhancing_Persian_and_Bengali_Text_to_Speech__TTS__Technology (2).pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text-To-Speech-

Prerequisities:

Setup Environment

Run Multi-GPU

Run with one GPU

How to fix Error over runtime

Tensorboard :

For Multi-Speaker

Push log to hub

experiment

XTTS V2

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

barghavanii/Text-To-Speech-

Folders and files

Latest commit

History

Repository files navigation

Text-To-Speech-

Prerequisities:

Setup Environment

Run Multi-GPU

Run with one GPU

How to fix Error over runtime

Tensorboard :

For Multi-Speaker

Push log to hub

experiment

XTTS V2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages