-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs.readdir and fs.readdirSync file order is always sorted on Linux (but not Windows) #3232
Comments
I'm very reluctant to change the current behavior because there's almost certainly someone somewhere relying on the result being sorted, on UNIX systems at least. I remember bug reports against node v0.4 about All is perhaps not lost though: work is under way to implement streaming readdir (libuv/libuv#416) that returns the results in directory order. |
@bnoordhuis what about an option to disable sorting ? |
Technically possible but requires semver-minor changes to libuv and node.js, in that order. |
@bnoordhuis, I understand the reluctance to change it, but here's my best case to change it:
Clearly, the last issue is the most important practical one for me so a option to disable sorting would be sufficient. The first issues are more about keeping Node clean... i.e. if you looked at the library 5 years from now, not knowing the history, would you think it was done right and has the most expressive power. Thanks for reading this far. :) |
I have been thinking about the legacy behavior issue. How about this as a win/win solution: // two NEW functions
fs.scandir(path[,options],callback)
fs.scandirSync(path[,options])
// where
var options = {
filter: filterFunc, // optional
compare: compareFunc, // optional
} and then deprecate and/or leave Pros:
Cons:
Thoughts? |
Without expressing an opinion on the merits of the change request, I wanted to add a note for the record about how Windows works... Some people think Windows sorts directory enumerations, but it doesn't. Enumerations on NTFS are usually sorted, and that's what most people use, but even there it isn't guaranteed. As per MSDN:
In addition, enumeration over a network filesystem may affect the results. Even if the server is using NTFS, and you'd get sorted names locally, you're not guaranteed to get sorted names over the network. The reverse can also be true (network filesystem may sort results even if server doesn't). Also... There is no universal definition of what "sorted" means on Windows. "DIR" in cmd.exe does a simple alphabetic sort, but the Windows Explorer GUI (which also provides the file open/save dialogs for most applications) has additional rules (see KB319827 and this). |
@bhxr You should open an issue at libuv/libuv for hashing out the C API. Be prepared to do the leg work; I'll review patches but it's not a topic I'm otherwise invested in. |
@bnoordhuis which solution are you referring to? Options:
Option 1 requires a change to libuv that I would need to bring up with them. After considering all options and also reading what @mcnameej posted. It is clear that My base proposal:libuv/libuv/src/unix/fs.c
If base proposal isn't acceptable because of 'sometimes it sorts right now' behavior change:nodejs/node/lib/fs.js
If you are okay with my base proposal, I'll go bug the libuv team. Otherwise, how do we resolve nodejs side? |
If by "turn off" you mean "add a flag", yes, that requires an (acceptable IMO) change in libuv.
There is more than one way to skin this cat. It could be either a libuv change, a node.js change, or both. |
I don't think a libuv issue was ever filed, or if there was, I can't find it. I'll go ahead and close this issue; it's been inactive for over half a year. |
Summary: This is kind of an edge case, but for large directories the `sort()` call actually becomes a big bottleneck (10K files => 600ms to sort). The solution is to offload the cost to the server in the form of a sorted `readdirSorted`. See the attached task for details. Note: nodejs/node#3232 indicates that on Linux/Mac `readdir` is actually sorted (but in a case-sensitive way). So we're almost there? If we're willing to relax the case restriction we could remove the sort on Linux/Mac. Reviewed By: semmypurewal Differential Revision: D8213348 fbshipit-source-id: 6798370565fb099944adbed550ca7dd230d8a6e2
In C readdir iterates over a directory. Modern file systems must be able to handle millions of file in a directory. ( which makes node a no go, on some large scale systems) Irritating is unpractical with an interpreted language. |
I tracked down the origins of doing any manner of sorting in the first place to this commit, for those who may come here from a web search in the future and are curious. |
fs.readdirSync() currently will return an ordered array, but it's not expected behaviour (see nodejs/node#3232) In order to bring parity to all the SDKs, I've added an explicit sort to the returned array
@MuYunyun what do you mean by after sorting manually. Both the following will result into the "LeetCode first output". var files = fs.readdirSync(dirpath); |
Added explicit sorting since order is undefined on Windows (nodejs/node#3232)
Apparently readdir is sorted on UNIX but not on Windows; nodejs/node#3232 Add sorting on name explicitly. Test plan: * sider snapshot list still sorts on snapshot name.
I am replacing a Bash script with a Nodejs script. Part of my old script did this:
find dirpath -print >files
As a replacement, in Nodejs I wrote a directory traversal routine that contains this line:
My regression test caught that the file name ordering was different between the old Bash script and the new Nodejs script. For details I that aren't important, it's best that the Nodejs replacement functions identically as the the old Bash process, however, I couldn't find a solution.
After digging into the source, I found the line in node/deps/uv/src/unix/fs.c(342)
Notice the alphasort? So, it looks like as implemented there is no solution to my problem.
Because fs.readdir() is not really a wrapper on readdir(3) as the documentation says (but is a wrapper for scandir) you cannot get the files in the order that readdir(3) returns. This means for instance, that a programs like
find
andls -U
are not implementable in Nodejs with identical behavior.It appears the window implementation does not sort. So this sorting behavior is also inconsistent across platforms. Therefore if you wanted sorting and you wanted to be cross platform you'd have to do it manually away:
Could this be changed so fs.readdir returns entries in system order? I believe this could be done by replacing the above line with (not tested just a guess):
The text was updated successfully, but these errors were encountered: