-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: extends recursive readdir to allow a function to limit traversal #49255
base: main
Are you sure you want to change the base?
Conversation
IMO, it's gonna be semver major unless depth is set to |
@bricss That should not be an issue. There is some complexity around the default value because I am trying to not interfere with the existing functionality of option The values for
If there is no indication of recursion, such as when both options |
/cc @nodejs/fs |
Instead of adding another special-purpose option, why not a |
@bnoordhuis I do not understand your guidance and request an example. |
I believe what Ben means is that it is not a good pattern to keep adding options to support different use cases. For example, someone else might want to filter recursion by depth, path, directory names, whether something is a symbolic link, etc. Instead, a mechanism that allows users to define their own criteria would be desirable. |
To accomplish that I can eliminate the new I propose the
Any numeric values supplied as BigInt would use BigIntStats opposed to Stats object and would use *timeNS fields as appropriate. The ones in bold I have working already in my personal code and they executes very quickly. In my personal code I also deliver the Stats object on each item in the recursive list. On windows in the when I render the results in a file explorer type GUI in the browser its just a bit slower than the native Windows File Explorer application, but I am also presenting metadata that Windows is not. I have to dive a bit further into how the recursive lists are actually generated in the Node library though, because that will require some work. The latest version of my userland approach is at: https://github.com/prettydiff/share-file-systems/blob/master/lib/terminal/commands/library/directory.ts |
IIUC Ben's suggestion is to accept a function that would return a boolean, I guess something that would look like readdir('./someDir', {
recursive(dir, depth) {
// If the function returns `true`, the recursion keeps going deeper.
return depth < 3 && dir.name.startsWith('someString');
},
}) Then users can implement filters as complicated or as simple as they want. |
Options fs.readdir("path", {
recursive: true,
recursiveFilter: function (path) {
return path.indexOf("vi") > 0;
}
}) This imposes different behavior.
|
It would be great to change |
How about this: if To solve the depth limiting, IMHO there definitely should be an integer { recursiveFilter: (path, depth) => depth < 3 }
// much cleaner than something like:
{ recursiveFilter: (path) => path.split(pathModule.sep).length < 4 } Aside from that, wouldn't it make more sense to make the callback determine "should we descend into this directory" or "should we continue altogether" rather than "should we process&return this file or directory"? |
Updated to pass in an object containing all the desired data and the code is cleaner. fs.readdir("path", {
recursive: true,
recursiveFilter: function (filterOutput) {
return filterOutput.parentName.indexOf("vi") > 0;
}
}) Where {
depth: {number}, // steps from the supplied base path
itemName: {string}, // name of the given file system artifact
itemPath: {string}, // absolute path of the given file system artifact
parentName: {string}, // name of the parent directory
parentPath: {string} // absolute path of the parent directory
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it would be better to keep readdir()
filter callback as simple as possible: call it only for directories, determine only if we should open them (but not if we want to keep them in returned array), and pass only path
/Dirent
and depth
.
If we want to make a convenience method that will utilize proposed fs.FilterOutput
or even more values such as size
, mtime
, extname
etc. etc., I think it's better to introduce separate fs.walk()
API that would also filter regular files and return Iterable
or ReadableStream
instead of plain array.
lib/internal/fs/promises.js
Outdated
const directory = absolutePath.slice(0, absolutePath.lastIndexOf(sep)); | ||
return absolutePath === originalPath || | ||
typeof options.recursiveFilter !== 'function' || | ||
(typeof options.recursiveFilter === 'function' && options.recursiveFilter({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fsPromises.readdir()
version, can recursiveFilter
be async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the need. There are no events or external actions here that would hang the thread. Forcing it to become async would push it into another callstack and there is a minor performance penalty for doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. I wondered if, for example, user would want to run await fsPromises.stat(dir)
inside filter function.
@LiviaMedeiros That would make things a tiny bit more simple because directory traversal is isolate from child item population in the existing logic. I can solve for that now by just removing code from the current approach, but I am going to spend some to think through this in case I am missing something or if somebody else has a different opinion. |
doc/api/fs.md
Outdated
### Class: `fs.FilterOutput` | ||
|
||
A {fs.FilterOutput} object provides identifiers for file system artifacts. This | ||
object is passed into the function provided to option `recursiveFilter` of | ||
[`fsPromises.readdir()`][], [`fs.readdir()`][], and [`fs.readdirSync()`][] when | ||
option `recursive` is assigned `true`. | ||
|
||
```console | ||
FilterOutput { | ||
depth: 2, | ||
itemName: 'fileName.ts', | ||
itemPath: '/users/userAccount/notes/fileName.ts', | ||
parentName: 'notes', | ||
parentPath: '/users/userAccount/notes' | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this seems pretty redundant. For consistency with fs.cp()
, why not just pass the (potentially relative) path to the function? Determining the depth is simple enough if necessary, and so is determining the parentName
etc.
I am taking the advice from @LiviaMedeiros and simplifying this effort as much as possible. There will be a depth option and it determines the number of descendant directories to traverse such that 0 eliminates recursion, 1 traverses the immediate child directories, and so forth. I will look at proposing a Sorry I took days to respond. I am living in a hotel and my ethernet port died. |
Effort complete, test units added, all build tasks green! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with Ben that a more general mechanism would be preferable over adding one or even multiple options and then yet another API.
doc/api/fs.md
Outdated
@@ -1318,6 +1319,11 @@ will be passed as {Buffer} objects. | |||
If `options.withFileTypes` is set to `true`, the resolved array will contain | |||
{fs.Dirent} objects. | |||
|
|||
`options.depth` only applies if option `options.recursive` is set to `true` and | |||
determines how many descendant depths of file system artifacts to include. A | |||
value of 0 eliminates recursion while a value of 1 traverses immediately child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value of 0 eliminates recursion while a value of 1 traverses immediately child | |
value of 0 eliminates recursion while a value of 1 traverses immediate child |
doc/api/fs.md
Outdated
@@ -3633,6 +3640,11 @@ the filenames returned will be passed as {Buffer} objects. | |||
If `options.withFileTypes` is set to `true`, the `files` array will contain | |||
{fs.Dirent} objects. | |||
|
|||
`options.depth` only applies if option `options.recursive` is set to `true` and | |||
determines how many descendant depths of file system artifacts to include. A | |||
value of 0 eliminates recursion while a value of 1 traverses immediately child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value of 0 eliminates recursion while a value of 1 traverses immediately child | |
value of 0 eliminates recursion while a value of 1 traverses immediate child |
lib/fs.js
Outdated
@@ -1452,7 +1469,7 @@ function readdirSyncRecursive(basePath, options) { | |||
for (let i = 0; i < length; i++) { | |||
const dirent = getDirent(path, readdirResult[0][i], readdirResult[1][i]); | |||
ArrayPrototypePush(readdirResults, dirent); | |||
if (dirent.isDirectory()) { | |||
if (dirent.isDirectory() && depthCheck(readdirResult[0][i])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't readdirResult[0][1]
potentially a Buffer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment immediately preceding this logic readdirResult is an array of two arrays where the first is the name of file system artifacts and the second is the type. Can a file system item have a buffer as a name? If so I will add a .toString()
to ensure coercion to a string unless there is a better way to handle this. The value that is passed in will be converted to an absolute path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a file system item have a buffer as a name?
Yes, if you set the encoding
option to 'buffer'
.
If so I will add a
.toString()
to ensure coercion to a string
toString()
will decode the Buffer
as a sequence of UTF-8 code points. However, as far as I can tell, there is no guarantee that the name will actually be a UTF-8 string.
unless there is a better way to handle this.
I am not aware of any way to handle this. The node:fs
module used to be a nice low-level interface for well-defined syscalls that would let the operating system handle paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case it sounds like there is not a good way to account for that scenario, so I can address this in the documentation and ensure it does not result in execution errors.
I don't think this is the right direction. I hope we can agree the callback-based filter approach is superior when it comes to flexibility. Adding a |
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. |
This looks like its going to build... the larger linux build is right at 3 hours of effort. I request guidance about how to proceed with the unrelated removed code from promises.js and then I believe this will be ready. Here is the commit where I first noticed it: 4ee9915 |
For the future pull requests I'd highly recommend to always start by creating new branch. For this one, you can squash commits into one and change first commit message.
$ git reset --soft 41a3a1daa28ba1431fe3e7d2c0d15f6e9a816b95 41a3a1d is the base commit; another way to address it is
For more flexible way to do 2 and 3 steps, I'd recommend Don't worry about commits that were added to upstream after 41a3a1d; there is no need to merge them as long as there are no really conflicting changes. |
2bae8c4
to
919acf8
Compare
This should be reviewed carefully by @nodejs/fs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM.
The documentation needs additions to YAML changelogs, like this one:
Lines 1237 to 1244 in c42f47c
<!-- YAML | |
added: v12.12.0 | |
changes: | |
- version: | |
- v20.1.0 | |
- v18.17.0 | |
pr-url: https://github.com/nodejs/node/pull/41439 | |
description: Added `recursive` option. |
The version
field for new entries should have REPLACEME
as value.
In the examples, I think it would be useful to show results of calling these functions, mainly to help readers to understand how exactly directories are counted (and what exactly happens when they are filtered out).
Roughly, as user I'd like to see what await readdir('a/b', (name, depth) => name !== 'c' && depth < 3);
would do if we have a/b/c/d/e
and a/b/0/1/2/3/4/5
directories.
It would demonstrate otherwise unobvious:
- that
name
of directory is its own name, not relative path to original directory nor absolute path - from where
depth
starts and where it ends - is filtered out directory itself included in results or not
The code starts counting depth from 1, is there a reason for that or we can start from 0?
In that case I believe there should be a list of ['a/b/c', 'a/b/0', 'a/b/0/1', 'a/b/0/1/2']. The logic only limits what to traverse and not what to populate, and I add some explicitness about this to the documentation. I also modified the depth counter to start at 0 instead of 1. So anything that is an immediate child of the base path is depth 0. I also added code comments to the documentation explaining exactly what is limited in the given example code samples. |
Things that should be addressed before approval:
I still think, having list like this in code example in documentation will be helpful. :) |
@prettydiff There is a merge conflict that prevents CI from starting, would you mind to squash and force-push it again, please? The resulting commit should lie on top of 41a3a1d (right now the commit history has the 7fa9316 from initial version that's supposed to be squashed with subsequent commits). |
@@ -79,99 +79,349 @@ function createFiles(path, fileStructure) { | |||
// Make sure tmp directory is clean | |||
tmpdir.refresh(); | |||
|
|||
const results = [ | |||
[ | |||
'a', 'a/bar', 'a/foo', 'aa', 'aa/bar', 'aa/foo', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'a', 'a/bar', 'a/foo', 'aa', 'aa/bar', 'aa/foo', | |
'a', 'a/a', 'a/bar', 'a/foo', 'aa', 'aa/bar', 'aa/foo', |
There are test failures, we should expect 'a/a'
directory in outputs where it's not filtered out.
This API will be very difficult to make performant. It means every directory returned must be converted into a string. It would work better to have a deny list, an allow list, a glob to match against, or something which can return a boolean without a function. |
This commit contains the necessary fixes to code and documentation per manual testing, but I still need to complete test units for test autoamtion. This pull request does not claim to be complete until test automation is updated, but is enough to open discussion for a new feature.
fixes: #49243