-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Windows) FS can not handle certain characters in file name #48673
Comments
Libuv recently switched to WTF-8, to deal with surrogate pairs (0xD83D 0xDD79 is a surrogate pair) in Windows file paths. That change landed in v20.4.0. cc @vtjnash |
We could potentially disable the UTF-8 transcoder error checking again though, and permit this to access the file anyways. The name would not round-trip correctly through the file-system, but that is not too significantly different from how case-sensitivity can be lost during the conversion to accessing the file system. |
Ah, interesting. That means node is explicitly converting the valid string in ecmascript's UCS-2 into an invalid UTF-8 string, possibly intentionally since ecmascript does not actually support this emoji and instead defines that it is composed of 2 unknown characters from the surrogate block. We could certainly change libuv to be a non-validating UTF-8 decoder so that it can still accept invalid strings as input. |
We forgot to mask off the high bits from the first byte, so we ended up always failing the subsequent range check. Refs: libuv#297 Fixes: nodejs/node#48673
This maybe shouldn't be closed until we get a libuv release then a Node release that includes it? We're going to collect a lot of dupes for this and they should be closed and point here until the release is done. #48813 #49042. (Speaking from experience, I'm dealing with some failing windows CI tests since 20.4.0 https://github.com/ipld/codec-fixtures/actions/runs/5756528910/job/15606085523 for files that have ... complicated ... names). |
LTS is also affected. Works in v18.0.0, but not in v18.18.0. |
It's fixed upstream in libuv/libuv@d09441c but not released yet. |
**What's the problem this PR addresses?** Node.js 20.4 broke using emojis in file names: nodejs/node#48673 **How did you fix it?** Disabling the code that relies on it (it's just a sanity check anyway). **Checklist** <!--- Don't worry if you miss something, chores are automatically tested. --> <!--- This checklist exists to help you remember doing the chores when you submit a PR. --> <!--- Put an `x` in all the boxes that apply. --> - [x] I have read the [Contributing Guide](https://yarnpkg.com/advanced/contributing). <!-- See https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released for more details. --> <!-- Check with `yarn version check` and fix with `yarn version check -i` --> - [x] I have set the packages that need to be released for my changes to be effective. <!-- The "Testing chores" workflow validates that your PR follows our guidelines. --> <!-- If it doesn't pass, click on it to see details as to what your PR might be missing. --> - [x] I will check that all automated PR checks pass before the PR gets reviewed.
We're using 20.3.1 version, could we had the fix backported into this version as well? |
WTF? import fs from "node:fs/promises";
await fs.writeFile("🚀🔥🛸.txt", "");
Also, I've seen these errors:
It's still not fixed. Even with libuv 1.48. Even Bun https://github.com/oven-sh/bun/ works fine with it. UPD. However, it's fixed in Current (v21.6.2). But it's not fixed in LTS (v20.11.1) that was released 3 day ago. > LTS (Long-Term Support). Only verified changes. Are you kidding? Node.js even have not tests for reading/writing files with surrogate pairs in a filename? |
Original commit message: fs: fix WTF-8 decoding issue (nodejs#4092) We forgot to mask off the high bits from the first byte, so we ended up always failing the subsequent range check. Refs: nodejs#2970 Fixes: nodejs#48673
Original commit message: fs: fix WTF-8 decoding issue (nodejs#4092) We forgot to mask off the high bits from the first byte, so we ended up always failing the subsequent range check. Refs: libuv/libuv#2970 Fixes: nodejs#48673
Original commit message: fs: fix WTF-8 decoding issue (nodejs#4092) We forgot to mask off the high bits from the first byte, so we ended up always failing the subsequent range check. Refs: libuv/libuv#2970 Fixes: nodejs#48673
Do we have a release date for this bug fix? |
I assume, in the next LTS release after merging that PR: It takes more than half a year to add 3 lines of code. |
Original commit message: fs: fix WTF-8 decoding issue (#4092) We forgot to mask off the high bits from the first byte, so we ended up always failing the subsequent range check. Refs: libuv/libuv#2970 Fixes: #48673 PR-URL: #51976 Refs: #48673 Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Finally, the fix was backported to LTS (v20.12.0).
|
Notable changes: This release addresses some regressions that appeared in Node.js 18.18.0: - (Windows) FS can not handle certain characters in file name nodejs#48673 - 18 and 20 node images give error - Text file busy (after re-build images) nodejs/docker-node#1968 - libuv update in 18.18.0 breaks webpack's thread-loader nodejs#49911 The libuv 1.45.0 and 1.46.0 updates that were released in Node.js 18.18.0 have been temporarily reverted. PR-URL: nodejs#50066
Version
v20.4.0
Platform
Microsoft Windows NT 10.0.19045.0 x64
Subsystem
No response
What steps will reproduce the bug?
Run the following script
Where
\uD83D\uDD79\uFE0F
is 🕹️How often does it reproduce? Is there a required condition?
Always failed for v20.4.0, v20.3.1 is okay
What is the expected behavior? Why is that the expected behavior?
Run without errors
What do you see instead?
Additional information
Tried on linux, no errors
The text was updated successfully, but these errors were encountered: