Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Encoding Errors Prevent Package Installation #7733

Closed
Paillat-dev opened this issue Sep 27, 2024 · 6 comments · Fixed by #7757
Closed

UTF-8 Encoding Errors Prevent Package Installation #7733

Paillat-dev opened this issue Sep 27, 2024 · 6 comments · Fixed by #7757
Labels
bug Something isn't working windows Specific to the Windows platform

Comments

@Paillat-dev
Copy link

Paillat-dev commented Sep 27, 2024

Summary

uv is failing to install packages, reporting UTF-8 encoding errors. The root cause is likely non-ASCII characters in the Windows username, which uv may not be handling correctly.

Steps to Reproduce

  1. Install from git:
    uv add git+https://github.com/feldberlin/timething.git
    
    Error:
    error: Failed to run `C:\Users\[USER]\AppData\Local\uv\cache\builds-v0\.tmp1pEQZM\Scripts\python.exe`
    Caused by: stream did not contain valid UTF-8
    
  2. Install specific package:
    uv add timething
    
    Error:
    error: Failed to download and build `docopt==0.6.2`
    Caused by: Failed to run `C:\Users\[USER]\AppData\Local\uv\cache\builds-v0\.tmpwGOGdp\Scripts\python.exe`
    Caused by: stream did not contain valid UTF-8
    

Environment

  • OS: Windows
  • uv version: 0.4.11
  • Python versions: uv: 3.12, project: 3.10

Impact

This issue prevents package installation, a core functionality of uv, significantly hindering development workflows for users with non-ASCII characters in their Windows usernames.

Additional Context

The root cause appears to be non-ASCII characters in the Windows username. For example, a user with the username "María García" might experience this issue, while a user with the username "John Smith" would not.

Example:

  • Affected path: C:\Users\María García\AppData\Local\uv\cache\builds-v0\.tmp1pEQZM\Scripts\python.exe
  • Non-affected path: C:\Users\John Smith\AppData\Local\uv\cache\builds-v0\.tmp1pEQZM\Scripts\python.exe

uv may not be correctly handling these non-ASCII characters in file paths, leading to the reported UTF-8 encoding errors.

Related Issues

Please let me know if you need any additional information or if you'd like me to perform any specific tests to confirm this hypothesis.

@charliermarsh
Copy link
Member

I'm sort of wondering if this is just a case of Windows long-path support not being enabled.

@charliermarsh charliermarsh added the windows Specific to the Windows platform label Sep 27, 2024
@Paillat-dev
Copy link
Author

I think it was enabled already, double checked and made sure it was, restarted, but still the same issue:

uv add timething
⠸ docopt==0.6.2                                                                                                         error: Failed to download and build `docopt==0.6.2`
  Caused by: Failed to run `C:\Users\[USER]\AppData\Local\uv\cache\builds-v0\.tmpZ2ViUG\Scripts\python.exe`
  Caused by: stream did not contain valid UTF-8

@charliermarsh
Copy link
Member

Unfortunately I haven't been able to replicate it on my Windows machine. uv pip install docopt==0.6.2 completes without error.

@kahojyun
Copy link
Contributor

I found a way to reproduce it on my computer.

  1. Set UV_CACHE_DIR to a directory containing non-ASCII characters like E:/缓存/uv
  2. Create a new uv project and run uv add docopt
uv add docopt
⠼ docopt==0.6.2
error: Failed to download and build `docopt==0.6.2`
  Caused by: Failed to run `E:\缓存\uv\builds-v0\.tmp2qY1wK\Scripts\python.exe`
  Caused by: stream did not contain valid UTF-8

When Windows username contains non-ASCII characters, the default uv cache directory will trigger this error.

@Paillat-dev
Copy link
Author

Right that is most likely the reason, as my username contains the é accentuated character.

@charliermarsh
Copy link
Member

Great find, thank you for the PR!

@charliermarsh charliermarsh added the bug Something isn't working label Sep 28, 2024
charliermarsh pushed a commit that referenced this issue Sep 28, 2024
## Summary

This PR fixes #7733. According to [CPython documentation on
`sys.stdout`](https://docs.python.org/3.12/library/sys.html#sys.stdout),
when `stdout`/`stderr` is non-character device like pipe, the encoding
will be set to system locale on windows. However, on the Rust side
`stdout_reader` and `stderr_reader` expect them to be encoded in UTF-8
and will fail when child process write non-ASCII character to
stdout/stderr, e.g., build directory name containing non-ASCII
character.

Both
[CPython3](https://docs.python.org/3.12/using/cmdline.html#envvar-PYTHONIOENCODING)
and [PyPy](https://doc.pypy.org/en/default/man/pypy3.1.html#environment)
support environment variable `PYTHONIOENCODING`. When it is set to
`utf-8`, python will use UTF-8 encoding for `stdin`/`stdout`/`stderr`.
Since `stdin` is not used by the spawned python process and we expect
`stdout`/`stderr` to use UTF-8, this fix should work as expected.
<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

I only tested it on my computer with CPython 3.12 and 3.7. With the fix
applied I confirmed that [the case I
described](#7733 (comment))
is fixed.

I'm using Windows 11 with system locale set to code page 936.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows Specific to the Windows platform
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants