Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.readdir and fs.readdirSync file order is always sorted on Linux (but not Windows) #3232

Closed
ghost opened this issue Oct 7, 2015 · 15 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. windows Issues and PRs related to the Windows platform.

Comments

@ghost
Copy link

ghost commented Oct 7, 2015

I am replacing a Bash script with a Nodejs script. Part of my old script did this:

find dirpath -print >files

As a replacement, in Nodejs I wrote a directory traversal routine that contains this line:

var files = fs.readdirSync(dirpath)

My regression test caught that the file name ordering was different between the old Bash script and the new Nodejs script. For details I that aren't important, it's best that the Nodejs replacement functions identically as the the old Bash process, however, I couldn't find a solution.

After digging into the source, I found the line in node/deps/uv/src/unix/fs.c(342)

n = scandir(req->path, &dents, uv__fs_scandir_filter, alphasort);

Notice the alphasort? So, it looks like as implemented there is no solution to my problem.

Because fs.readdir() is not really a wrapper on readdir(3) as the documentation says (but is a wrapper for scandir) you cannot get the files in the order that readdir(3) returns. This means for instance, that a programs like find and ls -U are not implementable in Nodejs with identical behavior.

It appears the window implementation does not sort. So this sorting behavior is also inconsistent across platforms. Therefore if you wanted sorting and you wanted to be cross platform you'd have to do it manually away:

var files = fs.readdirSync(dirpath).sort();

Could this be changed so fs.readdir returns entries in system order? I believe this could be done by replacing the above line with (not tested just a guess):

n = scandir(req->path, &dents, uv__fs_scandir_filter, NULL);
@Fishrock123 Fishrock123 added fs Issues and PRs related to the fs subsystem / file system. windows Issues and PRs related to the Windows platform. libuv Issues and PRs related to the libuv dependency or the uv binding. labels Oct 7, 2015
@bnoordhuis
Copy link
Member

I'm very reluctant to change the current behavior because there's almost certainly someone somewhere relying on the result being sorted, on UNIX systems at least. I remember bug reports against node v0.4 about fs.readdir() sorting one way and fs.readdirSync() another way, and that was when node had a fraction of the user base it has today.

All is perhaps not lost though: work is under way to implement streaming readdir (libuv/libuv#416) that returns the results in directory order.

@targos
Copy link
Member

targos commented Oct 7, 2015

@bnoordhuis what about an option to disable sorting ?

@bnoordhuis
Copy link
Member

Technically possible but requires semver-minor changes to libuv and node.js, in that order.

@ghost
Copy link
Author

ghost commented Oct 7, 2015

@bnoordhuis, I understand the reluctance to change it, but here's my best case to change it:

  • as is, sorting is undocumented behavior
  • as is, there is a performance penalty (e.g. if I have large cache directory, I pay the cost of a sort when I don't need or want it)
  • as is, it is inconsistent across platforms ... so even if you wrote something that depended on this, it will behave differently if you run on Windows
  • as is, it doesn't follow the principle of least astonishment (e.g. it doesn't behave like it's namesake readdir(3))
  • as is, it is inconsistent across implementations of alphasort (some implementations use strcoll(3) others use strcmp(3))
  • may have issues with UTF file names ... I'm not sure if strcoll(3) sorts UTF strings properly... a quick Google search shows that in PHP strcoll() has been a source of these types of issues ... I personally find doing sorting at the C level rather that the Javascript level odd because it will mean that fs.readdirSync() will likely not match fs.readdirSync().sort() depending on LC_COLLATE setting and platform
  • once sorted, you cannot undo it whereas if it wasn't sorted it's always easy to do if you want that with a simple Array.sort() call.

Clearly, the last issue is the most important practical one for me so a option to disable sorting would be sufficient.

The first issues are more about keeping Node clean... i.e. if you looked at the library 5 years from now, not knowing the history, would you think it was done right and has the most expressive power.

Thanks for reading this far. :)

@ghost
Copy link
Author

ghost commented Oct 7, 2015

I have been thinking about the legacy behavior issue. How about this as a win/win solution:

// two NEW functions

fs.scandir(path[,options],callback)
fs.scandirSync(path[,options])

// where
var options = {
    filter: filterFunc, // optional
    compare: compareFunc,   // optional
}

and then deprecate and/or leave readdir(path,callback) and readdirSync(path) as is.

Pros:

  • solves expressiveness issue
  • doesn't break anything in existence
  • solves cross platform consistency issue
  • solves principle of least astonishment issue (fs.scandir*() is just like scandir(3))
  • sorting would be done at Javascript level, so solves any potential UTF issues
  • unlike the proposed streaming opendir() ... this would be a drop in replacement for anyone wanting a more flexible fs.readdir*() that works the same way ... e.g. you could do a search and replace of readdir with scandir and your program should continue to work; then you can add options param as and where needed
  • clean API going forward

Cons:

  • feature request and not a bug fix anymore; somebody has to implement it (it's not a one liner change)

Thoughts?

@mcnameej
Copy link

mcnameej commented Oct 8, 2015

Without expressing an opinion on the merits of the change request, I wanted to add a note for the record about how Windows works...

Some people think Windows sorts directory enumerations, but it doesn't. Enumerations on NTFS are usually sorted, and that's what most people use, but even there it isn't guaranteed. As per MSDN:

The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.

In addition, enumeration over a network filesystem may affect the results. Even if the server is using NTFS, and you'd get sorted names locally, you're not guaranteed to get sorted names over the network. The reverse can also be true (network filesystem may sort results even if server doesn't).

Also... There is no universal definition of what "sorted" means on Windows. "DIR" in cmd.exe does a simple alphabetic sort, but the Windows Explorer GUI (which also provides the file open/save dialogs for most applications) has additional rules (see KB319827 and this).

@bnoordhuis
Copy link
Member

@bhxr You should open an issue at libuv/libuv for hashing out the C API. Be prepared to do the leg work; I'll review patches but it's not a topic I'm otherwise invested in.

@ghost
Copy link
Author

ghost commented Oct 9, 2015

@bnoordhuis which solution are you referring to?

Options:

  1. Turn of alphasort in uv__fs_scandir()
  2. Implement scandir*() in Javascript in nodejs/node/lib/fs.js in terms of new streaming opendir API

Option 1 requires a change to libuv that I would need to bring up with them.
Option 2 is a nodejs issue and requires no change from their current path assuming opendir API gets done.

After considering all options and also reading what @mcnameej posted. It is clear that fs.readdir*() does not behave in a consistent, cross platform and i18n friendly way. Thus, any code that is written assuming fs.readdir() sorts is also flawed because fs.readdir() does not consistently sort. Therefore, I think the best path forward is make it consistent across platforms and documented.

My base proposal:

libuv/libuv/src/unix/fs.c

  • turn of alphasort in unix/uv__fs_scandir()
  • fs.readdir*() become consistently non-sorting across all platforms just like it's *nix readdir(3) namesake and the documentation should clearly state that it does not sort

If base proposal isn't acceptable because of 'sometimes it sorts right now' behavior change:

nodejs/node/lib/fs.js

  • also change fs.readdir*() to sort at Javascript level using Array.sort()
  • implement fs.scandir() as I outlined above to expose ability to get unsorted list

If you are okay with my base proposal, I'll go bug the libuv team. Otherwise, how do we resolve nodejs side?

@bnoordhuis
Copy link
Member

Turn of alphasort in uv__fs_scandir()

If by "turn off" you mean "add a flag", yes, that requires an (acceptable IMO) change in libuv.

Implement scandir*() in Javascript in nodejs/node/lib/fs.js in terms of new streaming opendir API

There is more than one way to skin this cat. It could be either a libuv change, a node.js change, or both.

@bnoordhuis
Copy link
Member

I don't think a libuv issue was ever filed, or if there was, I can't find it. I'll go ahead and close this issue; it's been inactive for over half a year.

facebook-github-bot pushed a commit to facebookarchive/nuclide that referenced this issue May 31, 2018
Summary:
This is kind of an edge case, but for large directories the `sort()` call actually becomes a big bottleneck (10K files => 600ms to sort).

The solution is to offload the cost to the server in the form of a sorted `readdirSorted`. See the attached task for details.

Note: nodejs/node#3232 indicates that on Linux/Mac `readdir` is actually sorted (but in a case-sensitive way). So we're almost there? If we're willing to relax the case restriction we could remove the sort on Linux/Mac.

Reviewed By: semmypurewal

Differential Revision: D8213348

fbshipit-source-id: 6798370565fb099944adbed550ca7dd230d8a6e2
@paragi
Copy link

paragi commented Oct 29, 2018

Thoughts?

In C readdir iterates over a directory. Modern file systems must be able to handle millions of file in a directory. ( which makes node a no go, on some large scale systems)

Irritating is unpractical with an interpreted language.
Another often used solution is to do some filtering at a low level, with wildcards. (Glob function)
Sorting should of cause always be optional :)

@Xaekai
Copy link

Xaekai commented Feb 16, 2020

I tracked down the origins of doing any manner of sorting in the first place to this commit, for those who may come here from a web search in the future and are curious.
libuv/libuv@74999f8#diff-6a16903c26af4b4035eda9922a73ecc9R182

jaxxstorm added a commit to pulumi/pulumi-kubernetes that referenced this issue Apr 7, 2020
fs.readdirSync() currently will return an ordered array, but it's not
expected behaviour (see nodejs/node#3232)

In order to bring parity to all the SDKs, I've added an explicit sort to
the returned array
@MuYunyun
Copy link

So It seems it have to use such way manually to get an ordered file.

var files = fs.readdirSync(dirpath).sort();

image

after sort manually:

image

@nashid
Copy link

nashid commented Jul 6, 2020

So It seems it have to use such way manually to get an ordered file.

var files = fs.readdirSync(dirpath).sort();

image

after sort manually:

image

@MuYunyun what do you mean by after sorting manually. Both the following will result into the "LeetCode first output".

var files = fs.readdirSync(dirpath);
var files = fs.readdirSync(dirpath).sort();

@MuYunyun
Copy link

MuYunyun commented Jul 7, 2020

image

I mean the directory in my mac environment is sorted as '1.xx 2.xx 11.xx', and the result of fs.readdirSync(dirpath) is 1.xx 11.xx 2.xx, so I've to use fs.readdirSync(dirpath).sort(() => { // do some logic }) to get correct order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests

8 participants