Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make recursive rmdir more strict #35250

Closed
wants to merge 6 commits into from

Conversation

iansu
Copy link
Contributor

@iansu iansu commented Sep 18, 2020

This PR changes the behaviour of recursive rmdir (rimraf) in the following ways:

  1. Throws an exception when the path to remove does not exist
  2. Throws an exception when the path to remove points to a file instead of a directory

These changes are based on the discussion in #35171

cc @nodejs/tooling

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the fs Issues and PRs related to the fs subsystem / file system. label Sep 18, 2020
@mscdex mscdex added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-major PRs that contain breaking changes and should be released in the next major version. labels Sep 18, 2020
Copy link
Contributor

@boneskull boneskull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just questions, but I approve the intent here

try {
stats = statSync(path);
} catch (err) {
if (err.code === 'ENOENT')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is swallowed here if not ENOENT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rimraf swallows all kinds of errors and throwing all errors from stat caused lots of failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you'd get into trouble here if L198 threw, because stats would then be undefined, and the call to isDirectory() on L204 would fail... it doesn't seem recoverable?

assert.strictEqual(err.name, 'TypeError');
assert.match(err.message, /^The argument 'path' is not a directory\./);
}));
fs.unlinkSync(filePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo this should be guaranteed to be called--not sure if it is. also: don't know offhand if other tests try to guarantee this or just leave crap all over the fs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use a try/finally to clean this up.

}

if (!stats.isDirectory()) {
throw new ERR_INVALID_ARG_VALUE('path', path, 'is not a directory');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the correct error to throw? unsure of the intent of ERR_INVALID_ARG_VALUE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it is. I wanted to throw an ENOTDIR error but I'm not sure how to do that. I don't see it happening anywhere else in the code. I tried making a regular Error and setting the code to ENOTDIR but the linter didn't like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a convenience function for this error; maybe we should create one.

Copy link
Contributor

@bcoe bcoe Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iansu in internal/errors.js, I think we want to add:

E('ENOTDIR',
  'no such file or directory',
  SystemError);

You'd then instantiate it something like this:

new SystemError('ENOTDIR', {syscall: 'rmdir', code: 'ENOTDIR', path: './', message: 'whatever error message rimraf throws'', errno: whatever error code rmdir has})

Basically the ideal is that someone would be able to look at err.code both in the recursive and non recursive rmdir, and use the same logic.

Edit: I mean ENOTDIR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, sorry I mean ENOTDIR, but I think it should be similar logic.

@boneskull boneskull added the notable-change PRs with changes that should be highlighted in changelogs. label Sep 18, 2020
@RyanZim
Copy link
Contributor

RyanZim commented Sep 18, 2020

Is there a reason this is done at the top level, instead of making changes to the logic in __rimraf? Currently, it's doing a stat call for this check, followed by an lstat call in __rimraf; do we need this duplicate fs hit?

@mcollina
Copy link
Member

This is not semver-major because recursive is still experimental. We might want to decide to treat it as semver-major anyway.

cc @nodejs/tsc

@iansu
Copy link
Contributor Author

iansu commented Sep 18, 2020

@RyanZim That was my initial thought but it turns out that both of the existing rimraf functions (now named _rimraf and __rimraf) get called recursively. We only want this check to run once on the initial path so I needed to create a new entrypoint.

@bcoe
Copy link
Contributor

bcoe commented Sep 18, 2020

@RyanZim That was my initial thought but it turns out that both of the existing rimraf functions (now named _rimraf and __rimraf) get called recursively. We only want this check to run once on the initial path so I needed to create a new entrypoint.

@RyanZim @iansu I think I'd be content with a TODO(@iansu) refactor this to have one less redirection. It seems like we shouldn't need two methods that recurse.

@iansu iansu requested a review from boneskull September 18, 2020 19:13
@nodejs-github-bot
Copy link
Collaborator

try {
stats = statSync(path);
} catch (err) {
if (err.code === 'ENOENT')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you'd get into trouble here if L198 threw, because stats would then be undefined, and the call to isDirectory() on L204 would fail... it doesn't seem recoverable?

assert.strictEqual(err.code, 'ERR_INVALID_ARG_VALUE');
assert.strictEqual(err.name, 'TypeError');
assert.match(err.message, /^The argument 'path' is not a directory\./);
fs.unlinkSync(filePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a try/finally as well because these assert calls may throw

@nodejs-github-bot
Copy link
Collaborator

}

if (!stats.isDirectory()) {
throw new ERR_INVALID_ARG_VALUE('path', path, 'is not a directory');
Copy link
Contributor

@bcoe bcoe Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iansu in internal/errors.js, I think we want to add:

E('ENOTDIR',
  'no such file or directory',
  SystemError);

You'd then instantiate it something like this:

new SystemError('ENOTDIR', {syscall: 'rmdir', code: 'ENOTDIR', path: './', message: 'whatever error message rimraf throws'', errno: whatever error code rmdir has})

Basically the ideal is that someone would be able to look at err.code both in the recursive and non recursive rmdir, and use the same logic.

Edit: I mean ENOTDIR.

Copy link
Contributor

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to address Chris' comments too 👍 but, to me, once tests pass, this is looking 99% of the way there.

@iansu
Copy link
Contributor Author

iansu commented Sep 23, 2020

I believe I did address Chris' comments in my last two commits.

Copy link
Contributor

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @iansu 👍

@bcoe bcoe added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Sep 24, 2020
Copy link
Member

@gengjiawen gengjiawen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am -1 on Throws an exception when the path to remove does not exist

@gengjiawen gengjiawen removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Sep 24, 2020
@bcoe
Copy link
Contributor

bcoe commented Sep 25, 2020

@Trott to clarify, it are you advocating we'd keep the existing rm -rf behavior, and introduce a strict option. Or that we made the default behavior stricter, and introduce a loose option.

@gengjiawen
Copy link
Member

we'd keep the existing rm -rf behavior, and introduce a strict option

+1 for this.

@Trott
Copy link
Member

Trott commented Sep 25, 2020

@Trott to clarify, it are you advocating we'd keep the existing rm -rf behavior, and introduce a strict option. Or that we made the default behavior stricter, and introduce a loose option.

I'm OK either way, but if it was left up to me, I'd prefer that we made the default behavior stricter and introduce a loose option.

@bcoe
Copy link
Contributor

bcoe commented Sep 25, 2020

I'd prefer that we made the default behavior stricter and introduce a loose option.

I too would prefer the stricter behavior (throw on missing paths, and on attempt to remove file).

@gengjiawen what if we compromised by adding a force option. This would fit the rm -rf analogy well:

  • rmdir(path): removes an empty directory, fails if it contains folders or files.
  • rmdir(path, {recursive: true}: removes files and folders from folder recursively, fails if path is file, or if path does not exist.
  • rmdir(path, {recursive: true, force: true}: removes files and folders, deletes file if provided as path, does not fail if path does not exist.

I would love for us to figure out a compromise that works for folks, my fear is that we'll continue keeping rmdir/recursive in an experimental state, which is ultimately worse for the user base.

@CxRes
Copy link

CxRes commented Sep 25, 2020

It seems like we are going over the discussion in #35171 all over again. While it is important to find the best solution, I too wish this is not dragged out ad infinitum.

From @bcoe's table above, it is clear that there is no universal solution here. The design of various languages should only be treated as inspiration and not canon. At best, POSIX syscalls is the closest thing we can consider as being canonical to node fs.

I, for one, would have presumed that there is no contradiction between having a remove function for file, for directories and for everything. Even POSIX has unlink, rmdir and rm. The only difference between this and agreement in #35171 is that rmdir is also resursive (unlike POSIX rmdir), which we agreed was the small price of redundancy to pay to keep the change minimally breaking. Further rm has more semantics than just -rf which we might want to expose as well. To put it differently, this is a chance to design the best possible recursive remove from scratch. To me, that is a very attractive proposition!!!!!

@gengjiawen what if we compromised by adding a force option. This would fit the rm -rf analogy well:

rmdir(path): removes an empty directory, fails if it contains folders or files.
rmdir(path, {recursive: true}: removes files and folders from folder recursively, fails if path is file, or if path does not exist.
rmdir(path, {recursive: true, force: true}: removes files and folders, deletes file if provided as path, does not fail if path does not exist.

This would have been ideal. But didn't we conclude this would be too disruptive for existing rmdir users? I am good either way, but imho we still would need rm unless one plans to massively expand the scope of rmdir...

@bcoe
Copy link
Contributor

bcoe commented Sep 25, 2020

@CxRes @Trott @gengjiawen

While it is important to find the best solution, I too wish this is not dragged out ad infinitum.

It feels like we're having trouble reaching a compromise here, I know it's not the ideal we were picturing, but I'm thinking perhaps we do go with @Trott's suggestion of a strict option, which if set to true, has the behavior in this PR.

Then we update #35171 accordingly to reflect this.

Plan of action I propose:

  1. we update this PR so @iansu's new logic happens when you've set strict=true.
  2. we land this before fs: remove experimental language from rmdir recursive #35171, and have fs: remove experimental language from rmdir recursive #35171 speak to the strict flag, along with talking about the baseline behavior.

@bcoe
Copy link
Contributor

bcoe commented Sep 25, 2020

@nodejs/tsc I believe the TSC could help unblock the rmdir/recursive discussion. We've narrowed things down to two options (both of which involve introducing a strict flag).

  1. default strict to false, which would have the behavior of rm -rf (this is the current implementation of rmdir/recursive). Setting strict to true would result in behavior that throws when given a file path, and when given a path that does not exist.
  2. default strict to true, therefore changing the current behavior to throw on file paths, and on missing paths. Setting strict to false would behave like the implementation today.

CC: @Trott

@iansu
Copy link
Contributor Author

iansu commented Sep 25, 2020

I am personally in favour of making strict: true the default but I'm okay with either option.

@bcoe bcoe added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Sep 25, 2020
@Trott
Copy link
Member

Trott commented Sep 25, 2020

A little bit of a heads up that I'm wondering if we can come up with a better name than strict but I don't have any proposal yet. The flag affects one narrow-ish aspect of behavior so I'm wondering if it can be more descriptive. requireDirectoryPath or errorOnNonDirectoryPath are kind of along the lines of what I'm thinking, but those both have bigger problems. directoryPathOnly... yeah, these are all not great.

@ruyadorno
Copy link
Member

@bencoe I'm also +1 on having strict be the default behavior and follow the previously proposed analogy:

rm -rf -> rmdir(path, {recursive: true, force: true})

@CxRes
Copy link

CxRes commented Sep 25, 2020

@bcoe My larger question remains unanswered: How would you (and do you plan to?) expose all the features of POSIX rm in node? My sense is that it has to be done. Do you plan to this within rmdir? or is there a plan for a rm equivalent function as well? Because it has implications here!!!

Please reach a consensus on this before finalizing the PRs.

@CxRes
Copy link

CxRes commented Sep 25, 2020

A little bit of a heads up that I'm wondering if we can come up with a better name than strict but I don't have any proposal yet. The flag affects one narrow-ish aspect of behavior so I'm wondering if it can be more descriptive. requireDirectoryPath or errorOnNonDirectoryPath are kind of along the lines of what I'm thinking, but those both have bigger problems. directoryPathOnly... yeah, these are all not great.

If we are changing our minds and changing rmdir itself then the flag should be called force (not strict) for the sake of consistency!!!! And to keep things backwards compatible it has to be true by default (Though I sympathize with @Trott that I do not like this default).

@bcoe
Copy link
Contributor

bcoe commented Sep 26, 2020

If we are changing our minds and changing rmdir itself then the flag should be called force (not strict) for the sake of consistency!!!! And to keep things backwards compatible it has to be true by default (Though I sympathize with @Trott that I do not like this default).

@ruyadorno @CxRes force works for me -- edit: or anything better folks come up with.

How would you (and do you plan to?) expose all the features of POSIX rm in node? My sense is that it has to be done. Do you plan to this within rmdir? or is there a plan for a rm equivalent function as well? Because it has implications here!!!

I'd like to suggest we call this out of scope for the current goal, which is getting rmdir, recursive over the finish line.

This won't prevent us from adding additional APIs in the future.

@gengjiawen
Copy link
Member

@gengjiawen what if we compromised by adding a force option. This would fit the rm -rf analogy well

I agree with this (with a flag). But I want the default behavior to be tolerant on ENOENT (#35250 (comment)). I think pragmatism is mostly important in this case.

Maybe we can start a user poll in twitter to get more comprehensive community feedback ?

@CxRes
Copy link

CxRes commented Sep 26, 2020

I'd like to suggest we call this out of scope for the current goal, which is getting rmdir, recursive over the finish line.

This won't prevent us from adding additional APIs in the future.

(Sorry, in advance, for the upcoming bold but I feel strongly about what I am saying below but it is meant with the best of intentions):

But it does affect how we will add additional API in the future, hence imperative that it is considered right now. We have discussed two options:

  • Fixing rmdir as proposed here will continue to delete files ie rmdir is not just rmdir. It is rmdir+ rm/rimraf. This will not be breaking now, with the odd default of force flag. However, this will make a new rm seem even more redundant and all recursive functionality will have to be added to rmdir in the future and with this odd force default.
  • The reason for my proposal was in fs: remove experimental language from rmdir recursive #35171 (based on what seemed like the consensus/compromise in that thread) was to keep rmdir as truly rmdir (for deleting directories only) and then add rm to provide rimraf (rm as the universal remove function). Of course, this is breaking today but leads to more intuitive semantics.

Clarity on this question today, taking the pain today, will mean we do not snowball the poorly considered API choice which we intend to fix as best we can! Once this is resolved either way, it is so for the life of node js!!!! As I have stated earlier, I do not want to drag this either but any decision must take into account the consequences. And I just want to make sure that everyone has fully weighed the future consequence (not just present) of both these choices before this is finalized.

@bcoe
Copy link
Contributor

bcoe commented Sep 26, 2020

I agree with this (with a flag). But I want the default behavior to be tolerant on ENOENT (#35250 (comment)). I think pragmatism is mostly important in this case.

@gengjiawen I don't feel strongly about default behavior. Where I do feel strongly, is that when force=false we should throw on both ENOENT and ENODIR.

This would make it clear that force=true behaves like rm -rf, and force=false is strict behavior (similar to .NET).

The reason for my proposal was in #35171 (based on what seemed like the consensus/compromise in that thread) was to keep rmdir as truly rmdir (for deleting directories only) and then add rm to provide rimraf (rm as the universal remove function). Of course, this is breaking today but leads to more intuitive semantics.

@CxRes you've done a good job of expressing your point of view, and I haven't gotten the impression that it's being dismissed.

Node.js collaborators such as @gengjiawen, are making the case that they would rather continue to see this functionality be part of rmdir -- which was the decision that the majority of Node.js collaborators made when this feature was initially added.


I am a -1 on introducing an additional recursive method (not based on this being a bad idea, but based on it revisiting a decision we reached consensus on; I don't feel the design we landed on innately bad).

I am deferring the decision about default behavior of rmdir to the TSC.

@CxRes
Copy link

CxRes commented Sep 26, 2020

@bcoe I did not want to suggest that I was being dismissed, and I apologize if that is the impression, my concern was primarily that implications of either decisions were being fully considered. It was unclear until your last post, in particular, that there is consensus in the community to add all recursive functionality in rmdir and there has been considerable back and forth over the three threads. With that settled, I would agree that the present proposal is best path ahead.

Once this feature lands, I do intend to open another issue to add all POSIX rm features into rmdir!

@bcoe
Copy link
Contributor

bcoe commented Sep 26, 2020

I did not want to suggest that I was being dismissed

@CxRes 👍

I strongly encourage the @nodejs/tsc to consider your point of view, which you've summarized well.

What's important to me, is that we take steps towards removing the --experimental flag from some form of the recursive behavior.

@bcoe
Copy link
Contributor

bcoe commented Oct 1, 2020

Based on the conversation with the TSC today (see: https://github.com/nodejs/TSC/pull/934/files#diff-331798040b102838053b8a643840013e):

 1. Let's mark the current API as stable, since it's used a lot
 2. Rename the current API to e.g., `rm` while keeping the current API as alias
    to the new one
 3. The alias gets deprecated
 4. Add a new function that has the stricter version

@iansu is going to make an effort to send a patch with the alias, I believe at that time we should warn about the deprecation in logs -- our hope is to the this landed in time for node@v15 (which means we'll need to try for approvals early next week).

In node@v16, we'll drop the "alias", potentially leaving the recursive flag to have stricter behavior, or introducing a new function -- this work won't happen until the next major, so we have a bit of time to make sure we get it right.

@CxRes
Copy link

CxRes commented Oct 2, 2020

Let me try and get this straight since the TSC language is so abstract (I'll use terrible function names so as not bias an eventual choice):

  1. We mark recursive flag of fs.rmdir as stable.
  2. We make a new function, say, fs.removeRecursiveWithForce with the same functionality as fs.rmdir(... , {... , recursive: true} ). We set fs.rmdir(... , {... , recursive: true} ) = fs.removeRecursiveWithForce(... , {...}) potentially in node@v15
  3. We deprecate recursive flag of fs.rmdir in node@v15 and drop it node@v16.
  4. We either bring back the recursive flag with whatever behaviour we like in node@v16 or create a new function called fs.removeSomethingRecursively with whatever behaviour we like.

Do I follow this correctly??? 😕

@BridgeAR
Copy link
Member

BridgeAR commented Oct 2, 2020

@CxRes that is correct while the exact versions for the deprecation (doc-only / runtime) and removal might be different.

@targos
Copy link
Member

targos commented Oct 2, 2020

Unfortunately I wasn't at the TSC meeting, but I find it weird to mark an API stable knowing that we are going to deprecate it.

@bcoe
Copy link
Contributor

bcoe commented Oct 3, 2020

Unfortunately I wasn't at the TSC meeting, but I find it weird to mark an API stable knowing that we are going to deprecate it.

@targos I found this compromise reasonable. The point of calling rmdir/recursive stable for the Node 15 release line, is to indicate that we will fix bugs, and that folks who've opted to use rmdir/recursive in their applications already (relying on the permissive behavior), need not refactor their codebase immediately to deal with the deprecation notice.

Meanwhile, folks in the @nodejs/tooling group are working to get a PR made for the new rm method ASAP (hopefully in review by Monday).

Edit: It's also worth noting that we're not deprecating the recursive flag, but its permissive behavior. I'd advocate that using the recursive flag in Node@16 will match .NET or Deno -- for many use cases, folks might not need to refactor their code when we become stricter.

Let me try and get this straight since the TSC language is so abstract

I think you've summarized the plan of action well.


I'm ultimately a strong 👍 for this approach, because it unblocks an argument that's been going in circles for months, in such a way that's minimally disruptive to users that have already adopted the recursive behavior of rmdir.

@iansu iansu mentioned this pull request Oct 4, 2020
4 tasks
@iansu
Copy link
Contributor Author

iansu commented Oct 4, 2020

Closing in favour of #35494

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. notable-change PRs with changes that should be highlighted in changelogs. tsc-agenda Issues and PRs to discuss during the meetings of the TSC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.