Skip to content

0xA0 is causing gtts-cli to send EOF. #353

@medanisjbara

Description

@medanisjbara

Prerequisites

  • [*] Did you make sure a similar issue didn't exist?
  • [*] Did you update gTTS to the latest? (pip install --upgrade gTTS)

Current Behaviour (steps to reproduce)

The presence of 0xA0 in the input text is mostly ignored by gtts-cli. But in certain situations (the provided example) It will produce Error: 200 (OK) from TTS API. Probable cause: No audio stream in response. Unsupported language 'en' along with EOF (And it seems to be redirected to stderr without actually having a python error).

$ gtts-cli -f test -o test.mp3

working_test.txt
non_working_test.txt
Even though the files contain 0xA0 which I assumed it will make the file a binary file. The file command says the opposite.

$ file non_working_test.txt
non_working_test.txt: Unicode text, UTF-8 text

gtts-cli didn't complain about none UTF-8 characters. And using iconv to remove non utf-8 characters doesn't change anything.
$ iconv -f utf-8 -t utf-8 -c test does nothing to the file.
And some web pages use that character in between the text. Most text editors show it as space. Which is a bit frustrating to the user (You almost have no clue what to do or what causes the error)
And I can not blame the creator of the page since it seems like (after searching online) 0xA0 is a part of windows-1252 encoding (So if he wrote his blog in microsoft word, there's a big chance it got introduced there).

Expected Behaviour

gtts-cli should ignore that character and continue reading regardless of how and where it is present.

Context

I am writing a simple bash script that reads aloud the user's clipboard or a webpage associated with the url in the user's clipboard.
I personally have been using this command w3m "$(xclip -o)" | gtts-cli -f - | mpv - for over a year to boost productivity when reading. With some variations such less $pdf_file_or_epub_file | gtts-cli -f - | mpv - and so on and so forth.
The script basically does the same (Still very basic and under development).
And I came accross some webpages that caused that error to occure. After Some investigations I found out that the character 0xA0 is what is causing the problem.
So I created an issue and made a small workaround that uses bbe to replace the bad character with none (and then iconv for clean up since it is messing up a couple of things).

Environment

$ gtts-cli --version
gtts-cli, version 2.2.4

$ python --version
Python 3.9.12

$ uname -a
Linux Laptop 5.17.3-tkg-pds #1 TKG SMP PREEMPT Sat Apr 16 06:53:55 CET 2022 x86_64 Intel(R) Celeron(R) N4000 CPU @ 1.10GHz GenuineIntel GNU/Linux
  • OS: Gentoo/Linux x86_64

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions