Interpreter discovery bug wrt. Microsoft Store shortcut #2812

axel-kah · 2024-12-04T23:36:44Z

Issue
hatch is using virtualenvs interpreter discovery during creation of its virtual envs. The discovery also finds the Microsoft Store python shortcut. Even though the interpreter was not installed using the MS Store, this executable is used during discovery to run virtualenvs py_info.py script. In this setting, hatch is able to successfully create its venv (read: exit code 0), but the discovery returns a bunch of UnicodeDecodeErrors and spills them on the terminal ☹️

hatch env create
Exception in thread Thread-6 (_readerthread):
Traceback (most recent call last):
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
                  ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte
Exception in thread Thread-8 (_readerthread):
Traceback (most recent call last):
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
                  ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte

The root cause seems to be that virtualenv is spawning a subprocess for each interpreter it finds and has it execute the py_info.py script. On Windows this will also try the same with the "mysterious" C:\Users\axel-kah\AppData\Local\Microsoft\WindowsApps\python.exe. If python was not installed using MS Store, then this executable will return an error message using the infamous cp1252 encoding. When the OS is set to using a language like german, then this error message will contain german umlauts like ü which result in the UnicodeDecodeErrors.

Proposed Fix
Change the encoding to cp1252 when on windows when launching the subprocesses during discovery, instead of using utf-8 for all platforms.

I have verified the fix by locally patching a dev install of hatch and could submit a PR.

Environment

Provide at least:

OS: win11 (german language(!))
hatch 1.13.0
virtualenv 20.28.0
pip list of the host python where virtualenv is installed:

"C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\python.exe" -m pip list
Package           Version
----------------- ----------
anyio             4.6.2
certifi           2024.8.30
click             8.1.7
colorama          0.4.6
distlib           0.3.9
filelock          3.16.1
h11               0.14.0
hatch             1.13.0
hatchling         1.25.0
httpcore          1.0.6
httpx             0.27.2
hyperlink         21.0.0
idna              3.10
jaraco.classes    3.4.0
jaraco.context    6.0.1
jaraco.functools  4.1.0
keyring           25.4.1
markdown-it-py    3.0.0
mdurl             0.1.2
more-itertools    10.5.0
packaging         24.1
pathspec          0.12.1
pexpect           4.9.0
pip               24.0
platformdirs      4.3.6
pluggy            1.5.0
ptyprocess        0.7.0
Pygments          2.18.0
pywin32-ctypes    0.2.3
rich              13.9.2
setuptools        69.1.0
shellingham       1.5.4
sniffio           1.3.1
tomli_w           1.1.0
tomlkit           0.13.2
trove-classifiers 2024.10.13
userpath          1.9.2
uv                0.4.20
virtualenv        20.26.6
zstandard         0.23.0

Output of the virtual environment creation

Not applicable because venv is created implicitly by hatch.

The text was updated successfully, but these errors were encountered:

Darsstar · 2024-12-05T14:34:17Z

I came here from pyenv-win. Mind trying a different patch?
Keep the encoding as utf-8, but also pass errors="backslashreplace". (see https://docs.python.org/3/library/codecs.html#error-handlers and https://docs.python.org/3/library/subprocess.html#popen-constructor)

axel-kah · 2024-12-05T20:39:16Z

Keep the encoding as utf-8, but also pass errors="backslashreplace"

Seems to work just as well. Maybe Bernát can make a call on how he would like to have this handled, once he's back.

Mizokuiam · 2025-03-03T02:48:12Z

Hi @axel-kah,

Here's a suggested solution for this issue:

# in virtualenv/discovery/cached_py_info.py

import sys
import subprocess
from subprocess import PIPE

... other imports

def from_exe(exe, env=None, raise_on_error=True):
    """
    Given a python executable, get the python information as a dictionary
    """
    cmd = [exe, "-c", PY_INFO_CODE]
    env = env or {}
    # NEW: Set encoding to cp1252 on Windows
    encoding = "cp1252" if sys.platform == "win32" else "utf-8"
    try:
        process = subprocess.Popen(cmd, stdout=PIPE, stderr=PIPE, env=env) # MODIFIED
    except OSError as os_error:
        if raise_on_error:
            raise
        return {"error": os_error}

    out, err = process.communicate()
    try: # MODIFIED
        out = out.decode(encoding)
        err = err.decode(encoding)
    except UnicodeDecodeError: # catch the exception and return meaningful error information
        return {"error": f"Failed to decode output using {encoding} encoding: {out!r}, {err!r}"}

    # ... rest of the function (unchanged)

Explanation:

The original code in virtualenv's cached_py_info.py uses UTF-8 encoding to decode the output of the subprocess that runs py_info.py. However, the Microsoft Store python stub executable, when invoked incorrectly, outputs error messages in CP1252 encoding on Windows. This mismatch causes the UnicodeDecodeError.

The solution changes the decoding to use CP1252 on Windows. The modified line specifically sets the encoding variable based on the platform:

encoding = "cp1252" if sys.platform == "win32" else "utf-8"

Then, the output of the subprocess is decoded using this dynamically determined encoding:

try:
    out = out.decode(encoding)
    err = err.decode(encoding)
except UnicodeDecodeError: # Handles potential issues even with cp1252
    return {"error": f"Failed to decode output using {encoding} encoding: {out!r}, {err!r}"}

This allows the code to correctly handle the output from the Microsoft Store Python stub, even if it contains characters not representable in UTF-8. The try...except block is also added to catch potential UnicodeDecodeError even with cp1252 and return a more informative error message in such cases. This adds a layer of robustness to the decoding process.

This fix targets the root cause of the issue within virtualenv itself, ensuring that the interpreter discovery process can correctly handle the output from the Microsoft Store Python stub and prevents the UnicodeDecodeError from occurring. The change is localized and doesn't affect the behavior of virtualenv on other platforms.

pfmoore · 2025-03-03T09:46:18Z

However, the Microsoft Store python stub executable, when invoked incorrectly, outputs error messages in CP1252 encoding on Windows

Is that true in all locales, or does it actually use the appropriate locale encoding (which would be more in line with general Windows practice)? Would it be better here to use the locale encoding?

On the other hand, if we don’t care about non-ASCII bytes in the output, and all that matters is to avoid decode errors, any encoding for which all bytes are valid would work. In that case, CP1252 should be fine, although Latin-1 is conventionally used because it maps all bytes to the same code point.

axel-kah added the bug label Dec 4, 2024

axel-kah mentioned this issue Dec 4, 2024

bug: UnicodeDecodeError / RuntimeError: failed to find interpreter for Builtin discover of python_spec pyenv-win/pyenv-win#570

Open

gaborbernat added the help-wanted label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreter discovery bug wrt. Microsoft Store shortcut #2812

Interpreter discovery bug wrt. Microsoft Store shortcut #2812

axel-kah commented Dec 4, 2024

Darsstar commented Dec 5, 2024

axel-kah commented Dec 5, 2024

Mizokuiam commented Mar 3, 2025

pfmoore commented Mar 3, 2025

Interpreter discovery bug wrt. Microsoft Store shortcut #2812

Interpreter discovery bug wrt. Microsoft Store shortcut #2812

Comments

axel-kah commented Dec 4, 2024

Darsstar commented Dec 5, 2024

axel-kah commented Dec 5, 2024

Mizokuiam commented Mar 3, 2025

pfmoore commented Mar 3, 2025