-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: support different path name encodings #3519
Comments
Two possible solutions:
|
@bnoordhuis what are some downsides of the proposed solutions? |
With either solution we could add an environment variable or command line flag to override the default encoding. |
Regarding the two options:
I don't think there should be a global environment variable or command line flag to override the encoding (it's tempting), because encodings are all relative to the filesystem in play at the mounted subtree the user happens to be working with. For example, a user may use a single Node process to work with multiple different filesystems mounted in |
I'm ambivalent. The NFC/NFD dichotomy is confusing IMO in that the file you create with Nice write-up!
I don't exactly disagree but there is an (IMO reasonable) case to be made for common case convenience if there is going to be a default encoding anyway. It's something we can tackle later though. |
Thanks Ben! The NFC/NFD forms are surprising (hopefully the guide will help with that!) but I think it's not technically possible for Node to try and fix HFS+ now, and if Node tried it would be repeating the same mistake HFS+ made (implementing form-insensitivity by sacrificing form-preservation). It would also be equally confusing if Node normalized HFS+ NFD to NFC and users called |
This is somewhat misleading. UTF-8 is assumed everywhere in
You might have already figured it out, but just in case anyone else is confused: it's not possible to make you sure read back the same filename you created. HFS+ normalizes file names when they are created, and it's a lossy conversion. For example, both "도시락" and "도시락" become "도시락", and one can't know afterwards which one was used originally. (The strings all look the same because normalization preserves the "look" of the string, duh. Try comparing their lengths in a JS console.)
Has an actual user requested this, or is this pure theory? |
Note that both Windows (and Javascript too, for that matter) technically use UCS2 and not UTF16. That means that not all valid Windows filenames are expressible as UTF8. If we are going to fix this "propertly" libuv should probably use WTF-8 on Windows. |
@rvagg Looks good. |
@piscisaureus Even if libuv starts using WTF-8 on Windows (instead of UTF-8), most people won't notice. Unpaired surrogates are rarities. |
This makes several changes: 1. Allow path/filename to be passed in as a Buffer on fs methods 2. Add `options.encoding` to fs.readdir, fs.readdirSync, fs.readlink, fs.readlinkSync and fs.watch. 3. Documentation updates For 1... it's now possible to do: ```js fs.open(Buffer('/fs/foo/bar'), 'w+', (err, fd) => { }); ``` For 2... ```js fs.readdir('/fs/foo/bar', {encoding:'hex'}, (err,list) => { }); fs.readdir('/fs/foo/bar', {encoding:'buffer'}, (err, list) => { }); ``` encoding can also be passed as a string ```js fs.readdir('/fs/foo/bar', 'hex', (err,list) => { }); ``` The default encoding is set to UTF8 so this addresses the discrepency that existed previously between fs.readdir and fs.watch handling filenames differently. Fixes: nodejs#2088 Refs: nodejs#3519 Alternate: nodejs#3401
This makes several changes: 1. Allow path/filename to be passed in as a Buffer on fs methods 2. Add `options.encoding` to fs.readdir, fs.readdirSync, fs.readlink, fs.readlinkSync and fs.watch. 3. Documentation updates For 1... it's now possible to do: ```js fs.open(Buffer('/fs/foo/bar'), 'w+', (err, fd) => { }); ``` For 2... ```js fs.readdir('/fs/foo/bar', {encoding:'hex'}, (err,list) => { }); fs.readdir('/fs/foo/bar', {encoding:'buffer'}, (err, list) => { }); ``` encoding can also be passed as a string ```js fs.readdir('/fs/foo/bar', 'hex', (err,list) => { }); ``` The default encoding is set to UTF8 so this addresses the discrepency that existed previously between fs.readdir and fs.watch handling filenames differently. Fixes: #2088 Refs: #3519 PR-URL: #5616 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Resolved now. |
Continuing from #3401, it's clear that the way node.js handles path name encodings is sub-optimal. What is not clear is how to fix it. This issue is for discussing possible solutions.
A quick recap of the current situation:
Considerations:
The text was updated successfully, but these errors were encountered: