Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Use hunspell library. #13

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Use hunspell library. #13

wants to merge 2 commits into from

Conversation

JulienPalard
Copy link
Collaborator

@JulienPalard JulienPalard commented Dec 11, 2019

This one uses hunspell library instead of forking hunspells processes.

It runs a bit faster (1m38 vs 2m9s for a full python-docs-fr check).

It does not require hunspell being installed, but I thin we should find a way to also automatically download dictionaries.

closes #12

Copy link
Collaborator

@Seluj78 Seluj78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@JulienPalard
Copy link
Collaborator Author

@Seluj78 can you please try it on mac? I'm not sure about the dictionaries path.

@Seluj78
Copy link
Collaborator

Seluj78 commented Dec 12, 2019

Après vérification, non

(pospell-3.7) ➜  pospell git:(hunspell) pospell /Users/seluj78/Projects/python-docs-fr/
Traceback (most recent call last):
  File "/Users/seluj78/Projects/pospell/venv/bin/pospell", line 11, in <module>
    load_entry_point('pospell', 'console_scripts', 'pospell')()
  File "/Users/seluj78/Projects/pospell/pospell.py", line 326, in main
    args = parse_args()
  File "/Users/seluj78/Projects/pospell/pospell.py", line 222, in parse_args
    version="%(prog)s " + __version__ + " using hunspell: " + HUNSPELL_VERSION,
NameError: name 'HUNSPELL_VERSION' is not defined

@JulienPalard
Copy link
Collaborator Author

Does not works on Windows either, I'm happy [irony]: python/python-docs-fr#1064 (comment)

@JulienPalard
Copy link
Collaborator Author

@Seluj78 can you please retry?

@vpoulailleau
Copy link
Member

Cf python/python-docs-fr#1064, il y a des outils à installer dans le Ubuntu du Windows Store avant de pouvoir installer cette branche, et maintenant elle s'installe 🎉

Par contre, le problème de chemin hardcodé mentionné apparaît :

(venv_pospell) vincent@DESKTOP-URSSREA:/mnt/c/Users/vincent/Documents/traduction_python_docs/python-docs-fr$ pospell tutorial/venv.po
Traceback (most recent call last):
  File "/mnt/c/Users/vincent/Documents/traduction_python_docs/python-docs-fr/venv_pospell/bin/pospell", line 11, in <module>
    load_entry_point('pospell==1.0.3', 'console_scripts', 'pospell')()
  File "/mnt/c/Users/vincent/Documents/traduction_python_docs/python-docs-fr/venv_pospell/lib/python3.8/site-packages/pospell.py", line 348, in main
    errors = spell_check(
  File "/mnt/c/Users/vincent/Documents/traduction_python_docs/python-docs-fr/venv_pospell/lib/python3.8/site-packages/pospell.py", line 258, in spell_check
    hunspell = Hunspell(language, hunspell_data_dir="/usr/share/hunspell")
  File "hunspell/hunspell.pyx", line 172, in hunspell.hunspell.HunspellWrap.__init__
  File "hunspell/hunspell.pyx", line 130, in hunspell.hunspell.HunspellWrap._create_hspell_inst
hunspell.hunspell.HunspellFilePathError: File '/usr/share/hunspell/fr.aff' not found or accessible

@JulienPalard
Copy link
Collaborator Author

Le chemin ça se fix facilement, il suffirait que pospell télécharge lui même ses dictionnaires.

@vpoulailleau tu pense que cette branche est plus facile a installer que la master ou moins facile ?

@vpoulailleau
Copy link
Member

Et sous PowerShell directement (après avoir lutté sur comment activer un environnement virtuel, bah oui, il suffit de connaître l'admin qui connait des commandes obscures pour gérer les droits d'exécution…), ça donne :

(venv_powershell) PS C:\Users\vincent\Documents\traduction_python_docs\python-docs-fr> pip install git+https://github.com/JulienPalard/pospell.git@hunspell
Collecting git+https://github.com/JulienPalard/pospell.git@hunspell
  Cloning https://github.com/JulienPalard/pospell.git (to revision hunspell) to c:\users\vincent\appdata\local\temp\pip-req-build-4jtkwtyj
  Running command git clone -q https://github.com/JulienPalard/pospell.git 'C:\Users\vincent\AppData\Local\Temp\pip-req-build-4jtkwtyj'
  Running command git checkout -b hunspell --track origin/hunspell
  Branch 'hunspell' set up to track remote branch 'hunspell' from 'origin'.
  Switched to a new branch 'hunspell'
Collecting polib (from pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/30/a2/e407c3b00cace3d7fc8df14d364deeecfeb96044e1a317de583bc26eae58/polib-1.1.0-py2.py3-none-any.whl
Collecting docutils>=0.11 (from pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/22/cd/a6aa959dca619918ccb55023b4cb151949c64d4d5d55b3f4ffd7eee0c6e8/docutils-0.15.2-py3-none-any.whl (547kB)
     |████████████████████████████████| 552kB 3.3MB/s
Collecting regex (from pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/2b/6a/404decef1e7318f4415ba34907d3148f88319ae7e36c70f467c53049592a/regex-2019.12.9-cp38-none-win_amd64.whl (314kB)
     |████████████████████████████████| 317kB 2.2MB/s
Collecting cyhunspell (from pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/7b/c1/f2d43b9a111de312a9f2e33118926e7e5b96d3f634adf643a4bb543bbe8a/CyHunspell-1.3.4.tar.gz (2.7MB)
     |████████████████████████████████| 2.7MB 2.2MB/s
Collecting nltk (from pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/f6/1d/d925cfb4f324ede997f6d47bea4d9babba51b49e87a767c170b77005889d/nltk-3.4.5.zip (1.5MB)
     |████████████████████████████████| 1.5MB 1.7MB/s
Collecting cacheman>=2.0.6 (from cyhunspell->pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/46/7a/d091d6337693ddb1fe1c15b18fdc1c750e0355aa79aa33268cdeaeea4458/CacheMan-2.1.0-py2.py3-none-any.whl
Collecting six (from nltk->pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/65/26/32b8464df2a97e6dd1b656ed26b2c194606c16fe163c695a992b36c11cdf/six-1.13.0-py2.py3-none-any.whl
Collecting future>=0.16.0 (from cacheman>=2.0.6->cyhunspell->pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
     |████████████████████████████████| 829kB 1.6MB/s
Collecting psutil>=2.1.0 (from cacheman>=2.0.6->cyhunspell->pospell==1.0.3)
  Downloading https://files.pythonhosted.org/packages/8a/fa/b573850e912d6ffdad4aef3f5f705f94a64d098a83eec15d1cd3e1223f5e/psutil-5.6.7-cp38-cp38-win_amd64.whl (236kB)
     |████████████████████████████████| 245kB 2.2MB/s
Installing collected packages: polib, docutils, regex, future, psutil, six, cacheman, cyhunspell, nltk, pospell
  Running setup.py install for future ... done
  Running setup.py install for cyhunspell ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\vincent\documents\traduction_python_docs\python-docs-fr\venv_powershell\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\vincent\\AppData\\Local\\Temp\\pip-install-vbym8byp\\cyhunspell\\setup.py'"'"'; __file__='"'"'C:\\Users\\vincent\\AppData\\Local\\Temp\\pip-install-vbym8byp\\cyhunspell\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\vincent\AppData\Local\Temp\pip-record-c215sfmb\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\vincent\documents\traduction_python_docs\python-docs-fr\venv_powershell\include\site\python3.8\cyhunspell'
         cwd: C:\Users\vincent\AppData\Local\Temp\pip-install-vbym8byp\cyhunspell\
    Complete output (40 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.8
    creating build\lib.win-amd64-3.8\hunspell
    copying hunspell\platform.py -> build\lib.win-amd64-3.8\hunspell
    copying hunspell\__init__.py -> build\lib.win-amd64-3.8\hunspell
    package init file 'dictionaries\__init__.py' not found (or not a regular file)
    package init file 'libs\msvc\__init__.py' not found (or not a regular file)
    copying hunspell\hunspell.pxd -> build\lib.win-amd64-3.8\hunspell
    copying hunspell\thread.pxd -> build\lib.win-amd64-3.8\hunspell
    copying hunspell\hunspell.pyx -> build\lib.win-amd64-3.8\hunspell
    copying hunspell\hunspell.cpython-36m-x86_64-linux-gnu.so -> build\lib.win-amd64-3.8\hunspell
    copying hunspell\thread.hpp -> build\lib.win-amd64-3.8\hunspell
    copying hunspell\hunspell.cpp -> build\lib.win-amd64-3.8\hunspell
    creating build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_AU.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_CA.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_GB.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_NZ.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_US.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_ZA.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\test.aff -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_AU.dic -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_CA.dic -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_GB.dic -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_NZ.dic -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_US.dic -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\en_ZA.dic -> build\lib.win-amd64-3.8\dictionaries
    copying dictionaries\test.dic -> build\lib.win-amd64-3.8\dictionaries
    creating build\lib.win-amd64-3.8\libs
    creating build\lib.win-amd64-3.8\libs\msvc
    copying libs\msvc\libhunspell-msvc11-x64.lib -> build\lib.win-amd64-3.8\libs\msvc
    copying libs\msvc\libhunspell-msvc11-x86.lib -> build\lib.win-amd64-3.8\libs\msvc
    copying libs\msvc\libhunspell-msvc14-x64.lib -> build\lib.win-amd64-3.8\libs\msvc
    copying libs\msvc\libhunspell-msvc14-x86.lib -> build\lib.win-amd64-3.8\libs\msvc
    running build_ext
    building 'hunspell.hunspell' extension
    error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\vincent\documents\traduction_python_docs\python-docs-fr\venv_powershell\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\vincent\\AppData\\Local\\Temp\\pip-install-vbym8byp\\cyhunspell\\setup.py'"'"'; __file__='"'"'C:\\Users\\vincent\\AppData\\Local\\Temp\\pip-install-vbym8byp\\cyhunspell\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\vincent\AppData\Local\Temp\pip-record-c215sfmb\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\vincent\documents\traduction_python_docs\python-docs-fr\venv_powershell\include\site\python3.8\cyhunspell' Check the logs for full command output.
WARNING: You are using pip version 19.2.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
(venv_powershell) PS C:\Users\vincent\Documents\traduction_python_docs\python-docs-fr>

En version courte :

    building 'hunspell.hunspell' extension
    error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/

Bref, ça semble pas simple…

Je teste maintenant la branche master et je reviens !

@vpoulailleau
Copy link
Member

Bon bah c'est pas simple la vie sous Windows 😢

La version master est plus facile à installer :

(venv_powershell_master) PS C:\Users\vincent\Documents\traduction_python_docs\python-docs-fr> pip install pospell
Collecting pospell
  Downloading https://files.pythonhosted.org/packages/38/3a/e576c10d39c23d14a4f6b3919d57e5f2254967eedf27920abbb03ea131bd/pospell-1.0.3-py2.py3-none-any.whl
Collecting docutils>=0.11 (from pospell)
  Using cached https://files.pythonhosted.org/packages/22/cd/a6aa959dca619918ccb55023b4cb151949c64d4d5d55b3f4ffd7eee0c6e8/docutils-0.15.2-py3-none-any.whl
Collecting regex (from pospell)
  Using cached https://files.pythonhosted.org/packages/2b/6a/404decef1e7318f4415ba34907d3148f88319ae7e36c70f467c53049592a/regex-2019.12.9-cp38-none-win_amd64.whl
Collecting polib (from pospell)
  Using cached https://files.pythonhosted.org/packages/30/a2/e407c3b00cace3d7fc8df14d364deeecfeb96044e1a317de583bc26eae58/polib-1.1.0-py2.py3-none-any.whl
Installing collected packages: docutils, regex, polib, pospell
Successfully installed docutils-0.15.2 polib-1.1.0 pospell-1.0.3 regex-2019.12.9
WARNING: You are using pip version 19.2.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
(venv_powershell_master) PS C:\Users\vincent\Documents\traduction_python_docs\python-docs-fr> pospell .\tutorial\venv.po
hunspell not found, please install hunspell.

Mais par contre, c'est aussi difficile à utiliser… J'ai regardé si on trouve des binaires de hunspell, mais j'ai pas vu de solutions miracle. Hunspell est livré en sources, il faut le compiler…

Est-ce que vous pensez que https://github.com/gromnitsky/hunspell-windows est une piste ?

Ou encore https://github.com/zdenop/hunspell-mingw avec un vieux hunspell https://github.com/zdenop/hunspell-mingw/downloads (j'ai essayé vite fait, mais ça n'a pas l'air de fonctionner…)

@vpoulailleau
Copy link
Member

vpoulailleau commented Dec 12, 2019

En résumé, ça semble jouable de l'installer cette branche dans un Ubuntu installé à travers le Windows Store (moyennant la commande qui installe les dépendances de build), mais sinon, ça demande de l'acharnement ! (Ou quelqu'un de plus expérimenté que moi en Windows !!!)

@Seluj78
Copy link
Collaborator

Seluj78 commented Dec 12, 2019

cc @JulienPalard :

(pospell-3.7) ➜  pospell git:(hunspell) pospell /Users/seluj78/Projects/python-docs-fr/
Traceback (most recent call last):
  File "/Users/seluj78/Projects/pospell/venv/bin/pospell", line 11, in <module>
    load_entry_point('pospell', 'console_scripts', 'pospell')()
  File "/Users/seluj78/Projects/pospell/pospell.py", line 349, in main
    args.po_file, args.personal_dict, args.language, drop_capitalized, args.debug
  File "/Users/seluj78/Projects/pospell/pospell.py", line 258, in spell_check
    hunspell = Hunspell(language, hunspell_data_dir="/usr/share/hunspell")
  File "hunspell/hunspell.pyx", line 172, in hunspell.hunspell.HunspellWrap.__init__
  File "hunspell/hunspell.pyx", line 130, in hunspell.hunspell.HunspellWrap._create_hspell_inst
hunspell.hunspell.HunspellFilePathError: File '/usr/share/hunspell/fr.aff' not found or accessible

@Seluj78
Copy link
Collaborator

Seluj78 commented Dec 12, 2019

In macos, hunspell dicts (and others) are placed in /Library/Spelling

(pospell-3.7) ➜  pospell git:(hunspell) ls /Library/Spelling/
fr-classique.aff      fr-moderne.aff        fr-reforme1990.aff    fr_FR.aff             frhyph.tex            hyph_fr.dic           thes_fr.dat
fr-classique.dic      fr-moderne.dic        fr-reforme1990.dic    fr_FR.dic             hyph-fr.tex           hyph_fr.iso8859-1.dic thes_fr.idx

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dont spot "Partypolicularité"
3 participants