This project aims to generate a Python module which provides translations for
the Unicode descriptions found in the
unicodedata module. The source
of the translations is unicode-table.com which has
its source code at
GitHub. From this, PO and
MO files are generated by this project.
Note: these are also useful for other programming languages. An overview of supported language can be found here.
This localization has been discussed in:
- https://bugs.python.org/issue34053
- https://mail.python.org/pipermail/python-ideas/2018-July/051889.html
- https://groups.google.com/forum/#!topic/python-ideas/g2jj4WRVDFA
Install the following packages
sudo apt-get install wget unzip python3 gettext
In order to generate the files needed for a Python module with translations of Unicode descriptions, run
./1-clean.sh
which will remove previous generations. Then run
./2-download.sh
to download the translations in master.zip. These are unzipped with
./3-extract.sh
into the directory unicode-table-data-master. The Python script
./4-generate.py
will generate PO files in a tree in the directory locale, such as
cnLC_MESSAGES
deLC_MESSAGES
frLC_MESSAGES
- ...
This script will also write log messages on information, warnings and errors to the command line. Note that languages are skipped if less than 1% has been translated or 10% of the translations identical to the original text.
Also, warnings are show when source texts are identical. This happens for
<Control> and many ideographs and needs to be looked at further as the source
texts need to be unique for PO files.
The PO files can be converted to MO files by running
./5-convert.sh
This results in the following files in the directory locale
cnLC_MESSAGESsymbols.posymbols.mo
deLC_MESSAGESsymbols.posymbols.mo
frLC_MESSAGESsymbols.posymbols.mo
- ...
The files in locale can be packaged and distributed via e.g. PyPI or
eventually become part of the Python distribution. Note that this
localization can also be used for other programming languages.
The copyright of the translated strings can be found at unicode-table.com. The copyright of the scripts here is public domain.