md.language.yofication component provides a cyrillic text yofication (ёфикация) API and CLI application.
This is remastered version of Yoficator originally developed by unabashed.
What does it do ?
It conservatively replaces every е
to ё
when it's unambiguously a case of the latter. No context is used;
it relies entirely on a lack of dictionary entries for a correspondent "truly е
" homograph.
Yoficating Russian texts removes some unnecessary ambiguities.
To learn more, check Wikipedia in English or Russian.
Limitations:
- The code being conservative and not looking for context, it won't correct when a "truly
е
" homograph exists. Thus, a "все
" will never be corrected, because bothвсе
andвсё
exist as different words. - Prone to wrongly yoficate other Cyrillic-based languages, such as Bulgarian, Ukrainian, Belarussian.
pip install md.language.yofication --index https://source.md.land/python/
CLI Application provides next options:
--no-replace
(DEFAULT) — disables original files modification. Modified content is being printed to standard output (STDOUT). Conflicts with--replace
option. Makes no sense when few files arguments are specified.--replace
— enables original files modifications. Conflicts with--no-replace
option. Makes no sense when no files arguments were specified.
For more details see program help:
python3 -m md.language.yofication -h
Operations with files:
- Prints the modified text content to standard output (STDOUT) without changing the file
(the
--no-replace
option is the default).python3 -m md.language.yofication ./file.txt # prints to STDOUT
- Replaces specified file with yoficated content
python3 -m md.language.yofication --replace ./file.txt # replace in-place python3 -m md.language.yofication --replace ./file.txt ./file2.txt ./file3.txt # replaces files find . -type f -iname '*.txt' -exec python3 -m md.language.yofication --replace {} \+ # replaces files
Operation standard input (STDIN):
- Reads text from standard input (STDIN) and writes modified text to standard output (STDOUT).
cat ./file.txt | python3 -m md.language.yofication # reads from STDIN, prints to STDOUT echo "Где ее книга?" | python3 -m md.language.yofication python3 -m md.language.yofication <<< "Где ее книга?"
- Interactive mode:
$ python3 -m md.language.yofication Где ее книга? Где её книга?