Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreter discovery bug wrt. Microsoft Store shortcut #2812

Open
axel-kah opened this issue Dec 4, 2024 · 4 comments
Open

Interpreter discovery bug wrt. Microsoft Store shortcut #2812

axel-kah opened this issue Dec 4, 2024 · 4 comments

Comments

@axel-kah
Copy link

axel-kah commented Dec 4, 2024

Issue
hatch is using virtualenvs interpreter discovery during creation of its virtual envs. The discovery also finds the Microsoft Store python shortcut. Even though the interpreter was not installed using the MS Store, this executable is used during discovery to run virtualenvs py_info.py script. In this setting, hatch is able to successfully create its venv (read: exit code 0), but the discovery returns a bunch of UnicodeDecodeErrors and spills them on the terminal ☹️

hatch env create
Exception in thread Thread-6 (_readerthread):
Traceback (most recent call last):
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
                  ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte
Exception in thread Thread-8 (_readerthread):
Traceback (most recent call last):
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
                  ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte

The root cause seems to be that virtualenv is spawning a subprocess for each interpreter it finds and has it execute the py_info.py script. On Windows this will also try the same with the "mysterious" C:\Users\axel-kah\AppData\Local\Microsoft\WindowsApps\python.exe. If python was not installed using MS Store, then this executable will return an error message using the infamous cp1252 encoding. When the OS is set to using a language like german, then this error message will contain german umlauts like ü which result in the UnicodeDecodeErrors.

Proposed Fix
Change the encoding to cp1252 when on windows when launching the subprocesses during discovery, instead of using utf-8 for all platforms.

I have verified the fix by locally patching a dev install of hatch and could submit a PR.

Environment

Provide at least:

  • OS: win11 (german language(!))
  • hatch 1.13.0
  • virtualenv 20.28.0
  • pip list of the host python where virtualenv is installed:
"C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\python.exe" -m pip list
Package           Version
----------------- ----------
anyio             4.6.2
certifi           2024.8.30
click             8.1.7
colorama          0.4.6
distlib           0.3.9
filelock          3.16.1
h11               0.14.0
hatch             1.13.0
hatchling         1.25.0
httpcore          1.0.6
httpx             0.27.2
hyperlink         21.0.0
idna              3.10
jaraco.classes    3.4.0
jaraco.context    6.0.1
jaraco.functools  4.1.0
keyring           25.4.1
markdown-it-py    3.0.0
mdurl             0.1.2
more-itertools    10.5.0
packaging         24.1
pathspec          0.12.1
pexpect           4.9.0
pip               24.0
platformdirs      4.3.6
pluggy            1.5.0
ptyprocess        0.7.0
Pygments          2.18.0
pywin32-ctypes    0.2.3
rich              13.9.2
setuptools        69.1.0
shellingham       1.5.4
sniffio           1.3.1
tomli_w           1.1.0
tomlkit           0.13.2
trove-classifiers 2024.10.13
userpath          1.9.2
uv                0.4.20
virtualenv        20.26.6
zstandard         0.23.0

Output of the virtual environment creation

Not applicable because venv is created implicitly by hatch.

@Darsstar
Copy link

Darsstar commented Dec 5, 2024

I came here from pyenv-win. Mind trying a different patch?
Keep the encoding as utf-8, but also pass errors="backslashreplace". (see https://docs.python.org/3/library/codecs.html#error-handlers and https://docs.python.org/3/library/subprocess.html#popen-constructor)

@axel-kah
Copy link
Author

axel-kah commented Dec 5, 2024

Keep the encoding as utf-8, but also pass errors="backslashreplace"

Seems to work just as well. Maybe Bernát can make a call on how he would like to have this handled, once he's back.

@Mizokuiam
Copy link

Hi @axel-kah,

Here's a suggested solution for this issue:

# in virtualenv/discovery/cached_py_info.py

import sys
import subprocess
from subprocess import PIPE

... other imports

def from_exe(exe, env=None, raise_on_error=True):
    """
    Given a python executable, get the python information as a dictionary
    """
    cmd = [exe, "-c", PY_INFO_CODE]
    env = env or {}
    # NEW: Set encoding to cp1252 on Windows
    encoding = "cp1252" if sys.platform == "win32" else "utf-8"
    try:
        process = subprocess.Popen(cmd, stdout=PIPE, stderr=PIPE, env=env) # MODIFIED
    except OSError as os_error:
        if raise_on_error:
            raise
        return {"error": os_error}

    out, err = process.communicate()
    try: # MODIFIED
        out = out.decode(encoding)
        err = err.decode(encoding)
    except UnicodeDecodeError: # catch the exception and return meaningful error information
        return {"error": f"Failed to decode output using {encoding} encoding: {out!r}, {err!r}"}

    # ... rest of the function (unchanged)

Explanation:

The original code in virtualenv's cached_py_info.py uses UTF-8 encoding to decode the output of the subprocess that runs py_info.py. However, the Microsoft Store python stub executable, when invoked incorrectly, outputs error messages in CP1252 encoding on Windows. This mismatch causes the UnicodeDecodeError.

The solution changes the decoding to use CP1252 on Windows. The modified line specifically sets the encoding variable based on the platform:

encoding = "cp1252" if sys.platform == "win32" else "utf-8"

Then, the output of the subprocess is decoded using this dynamically determined encoding:

try:
    out = out.decode(encoding)
    err = err.decode(encoding)
except UnicodeDecodeError: # Handles potential issues even with cp1252
    return {"error": f"Failed to decode output using {encoding} encoding: {out!r}, {err!r}"}

This allows the code to correctly handle the output from the Microsoft Store Python stub, even if it contains characters not representable in UTF-8. The try...except block is also added to catch potential UnicodeDecodeError even with cp1252 and return a more informative error message in such cases. This adds a layer of robustness to the decoding process.

This fix targets the root cause of the issue within virtualenv itself, ensuring that the interpreter discovery process can correctly handle the output from the Microsoft Store Python stub and prevents the UnicodeDecodeError from occurring. The change is localized and doesn't affect the behavior of virtualenv on other platforms.

@pfmoore
Copy link
Member

pfmoore commented Mar 3, 2025

However, the Microsoft Store python stub executable, when invoked incorrectly, outputs error messages in CP1252 encoding on Windows

Is that true in all locales, or does it actually use the appropriate locale encoding (which would be more in line with general Windows practice)? Would it be better here to use the locale encoding?

On the other hand, if we don’t care about non-ASCII bytes in the output, and all that matters is to avoid decode errors, any encoding for which all bytes are valid would work. In that case, CP1252 should be fine, although Latin-1 is conventionally used because it maps all bytes to the same code point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants