VITS and XTTS Demo can be found in this link :https://huggingface.co/spaces/saillab/ZabanZad_PoC
- Python >= 3.9
- Espeak-NG :
sudo apt install -y espeak-ng
- TTS (from the repo):
pip install -U pip setuptools wheel
git clone https://github.com/coqui-ai/TTS
pip install -e TTS/
- init.sh--> add "sudo apt update && sudo apt upgrade -y
sudo apt install -y python3.10 python3.10-dev python3.10-venv /usr/bin/python3.10 -m venv /opt/python/envs/py310 /opt/python/envs/py310/bin/python -m pip install -U pip setuptools wheel /opt/python/envs/py310/bin/python -m pip install -U ipykernel ipython ipython_genutils jedi lets-plot aiohttp pandas
sudo apt install -y espeak-ng"
- in attached data --> on file environment.yml -->change datalore-base-env:"minimal" to "py310"
- background computation --> Never Trun off
- git clone https://github.com/coqui-ai/TTS.git
- navigate to TTS (cd TTS)
- pip install -e.
- CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py
- For avoiding any intruption over your training use trainer = Trainer( TrainerArgs(use_accelerate=True), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples, ) trainer.fit()
- Faster train with more than 1 num_loader_workers=4 in advance you should do sudo mount -o remount,size=8G /dev/shm
- !nvidia-smi(status of GPU)
- os.environ["CUDA_VISIBLE_DEVICES"] = "7" which GPU you intend to run your code
- Error :"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 9.77 GiB of which 52.31 MiB is free. Including non-PyTorch memory, this process has 8.68 GiB memory in use. Of the allocated memory 8.25 GiB is allocated by PyTorch, and 155.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" Solution : reduce batch_size
- Error: Can't find some wavs although they are existed Solution : your wavs might be nested in files and cannot find nested files
- if you use common-voice as your formatter your wavs must be store in clips
- Error :dimension Solution : mix precesion =false
- For tensorboard download the latest output and un zip it go to that file run this command on window shell go cd to the address you stored the file : tensorboard --logdir=. --bind_all --port=6007 --> url open in your browser
- Use wandb -->import wandb
Start a wandb run with
sync_tensorboard=True
if wandb.run is None: wandb.init(project="persian-tts-vits-grapheme-cv15-fa-male-native-multispeaker-RERUN", group="GPUx8 accel mixed bf16 128x32", sync_tensorboard=True)
- use_speaker_embedding=True
- speaker_manager = SpeakerManager() speaker_manager.set_ids_from_data(train_samples + eval_samples, parse_key="speaker_name") config.num_speakers = speaker_manager.num_speakers
- model = Vits(config, ap, tokenizer, speaker_manager=speaker_manager)
- to run with multi-GPU : CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py
- Navigate to stored log
- use python code titled "push_to_hub
run multi-speaker and then go to single speaker it removed the capability of model to be multi- speaker . for load your previous checkpoint navigate to the place that you want to resume your training and on that .py file add "model.load_checkpoint(config, 'best_model_495.pth', eval=False)" and load your modle from the checkpoint that you wish to restart I started an experiment dated 9 Nov cv--> azure_male-->azure_female and it seams that although it is becoming single speaker you don't need to change model configuration and it will keep running
follow this link coqui-ai/TTS#3229 but still got error
-
new virtual environment PLUS system pip packages. as per: https://lambdalabs.com/lambda-stack-deep-learning-software
python -m venv name-of-venv source name-of-venv/bin/activate source myenv/bin/activate
-
- It's always good to update Pip and Setuptools and Wheel:
pip install pip setuptools wheel -U
- It's always good to update Pip and Setuptools and Wheel:
-
clone Coqui TTS and install, as per: https://tts.readthedocs.io/en/latest/tutorial_for_nervous_beginners.html
$ git clone https://github.com/coqui-ai/TTS $ cd TTS $ pip install -e .
-
verify installation of TTS with
pip list
-
- again, verify that TTS actually works with
tts -h
to bring up its Help file.
- again, verify that TTS actually works with
-
test synthesizing per: https://github.com/coqui-ai/TTS#single-speaker-models
Run TTS with default models:
$ tts --text "Text for T,T,S, and Sailing on AI." --out_path speech.wav
-
this error "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 9.77 GiB of which 52.31 MiB is free. Including non-PyTorch memory, this process has 8.68 GiB memory in use. Of the allocated memory 8.25 GiB is allocated by PyTorch, and 155.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" reduce batch_size 6.CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py CUDA_VISIBLE_DEVICES=0 python train.py --continue_path path/to/previous/run/folder/ CUDA_VISIBLE_DEVICES=0,1,2 python -m trainer.distribute --script train.py 7.tensorboard --logdir=. --bind_all --port=8080 CUDA_VISIBLE_DEVICES="0,1," accelerate launch --multi_gpu --num_processes 2 --script multi-speaker.py --continue_path /home/bargh1/TTS/runs/persian-tts-vits-grapheme-cv15-multispeaker-RERUN-October-24-2023_05+57PM-1e152692/runs/persian-tts-vits-grapheme-cv15-Arman-6nov-November-07-2023_10+09AM-1e152692