Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: port rimraf to core as shutil.rmtree #28208

Closed
wants to merge 8 commits into from

Conversation

iansu
Copy link
Contributor

@iansu iansu commented Jun 13, 2019

The @nodejs/tooling group has been working on porting rimraf into Node.js Core: nodejs/tooling#13. We've spoken with @isaacs and this is done with his cooperation: iansu#1 (comment). We've chosen to implement this in a new module called shutil rather than adding it to the existing fs.rmdir code. There seems to be interest in this as a similar PR was recently opened: #28171.

1. Why?

rimraf is one of the most popular third-party modules and is used in millions of projects. The Tooling Group thinks this kind of commonly used module should be included in Node.js Core and not require a third-party package.

2. Why not add a recursive option to fs.rmdir?

By adding a flag rather than a new function feature detection becomes difficult. People have requested a way to detect the recursive option of mkdir for example.

3. Why port rimraf directly?

The rimraf code contains a number of workarounds for bugs and edge cases in platforms like Windows and sunos. These are the result of many issues and bugfixes over the years and we don't want to lose this functionality.

4. Why shutil?

The name itself comes from Python. Something like rimraf is not a filesystem primitive, instead it's an algorithm that is built on top of filesystem primitives. On other platforms these kind of utilities are often grouped under a "shell" namespace, and the Tooling team can imagine several additional utilities added to this module in the future.

5. Why is this not implemented in C++?

Like the recursive mkdir operation, recursive rmdir is not atomic. Unlike recursive mkdir, the number of operations is not bounded by the directory depth; instead, it's one operation for each file and directory encountered. Recursive mkdir cannot run its operations in parallel, either; it must create directories in serial, in descending order.

Recursive rmdir, on the other hand, would benefit from removing files in parallel--userland rimraf takes this strategy to remove files quickly as libuv's fs thread pool can handle it. There's no precedent, AFAIK, for running batch operations across multiple libuv threads in Node.js core. The problem is compounded by the necessity to implement a synchronous version of the API; consider execSync's implementation.

Several attempts have been made at implementing recursive rmdir in C++, but none so far have executed these operations using multiple threads. Because of this, a serial C++ implementation is significantly slower (@boneskull benchmarked his own implementation against userland rimraf) than a parallel, JS implementation.

In addition, a C++ implementation would need to consider the workarounds for various filesystems to provide a builtin on-par with userland rimraf. So far, no C++ implementation has attempted this.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
  • add rimraf license to combined license

@nodejs-github-bot nodejs-github-bot added the build Issues and PRs related to build files or the CI. label Jun 13, 2019
@boneskull
Copy link
Contributor

not sure why this is labeled build

Copy link
Contributor

@boneskull boneskull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the license from rimraf copied over

@Trott
Copy link
Member

Trott commented Jun 13, 2019

not sure why this is labeled build

The bot added that label because the PR modifies node.gyp.

@boneskull
Copy link
Contributor

@iansu I ticked the "tests are included" checkbox, since some tests are included, but unsure if you intended to add more?

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs documentation. I assume it's not present because this is a work-in-progress and I will label it as such. If that's wrong, please comment.

@Trott Trott added wip Issues and PRs that are still a work in progress. fs Issues and PRs related to the fs subsystem / file system. labels Jun 13, 2019
@Trott
Copy link
Member

Trott commented Jun 13, 2019

I'm a-gonna ping @nodejs/tsc on this one because I'm pretty sure there will be strong opinions and lots of conversation. (Any links to existing previous conversations in the tooling group that might help us avoid rehashing stuff that's already been worked out might be useful.)

@boneskull
Copy link
Contributor

nodejs/tooling#13 nodejs/tooling#23 and otherwise much discussion during meetings (in the minutes/ dir there)

@boneskull
Copy link
Contributor

to expand on "why the new module name", it's to clearly differentiate high-level fs operations (such as rimraf--mkdirp probably should have been done this way) from low-level, atomic operations (the entirety of fs, sans mkdirp)

@Trott
Copy link
Member

Trott commented Jun 13, 2019

Thanks for the detailed explanation in the description of the PR. Am I reading it correctly that the main reason to choose shutils.rmtree() over fs.rmtree() is the whole "not a primitive/atomic" thing? One might argue that we should put it where someone is most likely to look for it, and fs.rmtree() would be that place, regardless that this isn't a primitive/atomic. Maybe there are also other reasons for keeping it out of fs?

Also, if there are any comments that could be made that might pre-empt "why not rmTree() or removeTree() or rmdirRecursive()" bike-shedding, please make them. On the other hand, if the bike-shedding is unavoidable, that's OK. (Getting the API name right can be important, so "bike-shedding" may be a term I should avoid as it's kind of dismissive.)

@Trott
Copy link
Member

Trott commented Jun 13, 2019

Unfortunately, shutil is an existing module in the ecosystem: https://www.npmjs.com/package/shutil

Fortunately, it's not widely-used nor actively-maintained. So creating an internal shutil may not be disruptive.

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should just be part of fs, whether rmdir recursive or fs.rimraf or something.

@bcoe bcoe mentioned this pull request Jun 13, 2019
4 tasks
@sam-github sam-github removed the build Issues and PRs related to build files or the CI. label Jun 13, 2019
@bcoe
Copy link
Contributor

bcoe commented Jun 13, 2019

@Fishrock123, I'm in agreement with @sam-github (who commented on @zero1five's PR) and with @refack who encouraged using shutil, that the behavior of rimraf is a layer of abstraction above fs and should probably be in its own module.

I personally wish it could be in a node:rimraf standard module, as it feels like we're super close to being able to move forward with module namespaces as a home for an expanded API for Node.js and JavaScript ... maybe this is putting the cart in front of the horse though (and I don't want to hold up this work).

@@ -0,0 +1,310 @@
'use strict';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISC license and copyright is missing, isn't it?

It's ISC for a reason, so you of course already have permission to take the code, as long as the copyright and permission stick around. If it's easier for Node.js to relicense it as MIT, that's probably doable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about the correct way to incorporate them so that's why they're missing. It's on the todo list though.

Copy link
Contributor

@silverwind silverwind Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be enough to add it to LICENSE like this, no need to litter files with licensing comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to vendor rimraf could we do it similarly to how we've vendored other modules. Please look at the experimental fetch PR for an example of how to do this (including the license)

#27979

@iansu
Copy link
Contributor Author

iansu commented Jun 13, 2019

Documentation and incorporating the rimraf license are on the todo list. There are some tests already but I'd like to port whatever we can from rimraf as well to cover as much as possible.

@addaleax addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label Jun 13, 2019
@boneskull
Copy link
Contributor

I'm less opinionated on whether this should be in its own module, or some as-of-yet-unknown other namespace, or live in fs. I am much more interested in just getting the functionality in.

@Fishrock123
Copy link
Contributor

Just because it isn't just a fs operation at an os level doesn't mean that it shouldn't logically be grouped with other file system operations in order to be less confusing to users...

I am much more interested in just getting the functionality in.

That's cool and all but it may be hard or impossible to move later.

@sam-github
Copy link
Contributor

sam-github commented Jun 13, 2019

@iansu you might want to wait a bit before doing all the cleanup, it would be a shame to spend the time if it doesn't land. Not that I am predicting whether it will land or not, I don't know.

I personally am an unapologetic fan of node+npm. I've used multiple languages with "batteries included" standard libraries. There is a short-term pleasure from having so much functionality immediately availlable, but I have watched them struggle to keep both backwards compatible and relevant over time. I think treating a registry (or federation of registries it that is what happens) as a semvered source of features is a strength of node, not a weakness. I prefer for node core to focus on runtime features that enable the ecosystem, rather than absorbing the ecosystem. I won't try to block this if its what people want, but for the record, I prefer it not to be.

This doesn't do anything better than the external rimraf package (it's not faster, it's not avoiding binary addons, etc)., and unless I misread, it appears to have less features, because it lacks globbing. I expect bug reports on that lack almost immediately. But we can vendor in glob, too, is this the way we want to go?

The sole benefit to node users is to avoid having to do npm i rimraf and to transfer the maintenance burden of rimraf from https://github.com/isaacs/rimraf to node core. Not even transfer, rimraf will still need to be maintained indefinitely, "duplicate" would be a better description.

I understand why non-node developers unfamiliar with an easy-to-use and powerful dependency management tool don't understand why node has a small builtin library, but I genuinely don't understand why node developers would be adverse to using npm/yarn to get features. At the tooling meeting in Berlin, it was pointed out that it forces the introduction of npm early when tutoring people on node.js, but I struggle to find that compelling. Shouldn't a dependency manager be introduced early? Is there really that many zero-dependency projects out there?

@silverwind
Copy link
Contributor

silverwind commented Jun 14, 2019

Might be something for a future optimization but we can likely eliminate the extra stat calls by using the withFileTypes option of readdir.

What does shutil stand for? Shell Utilities?

@iansu
Copy link
Contributor Author

iansu commented Jun 14, 2019

@joyeecheung Thanks for pointing that out. I'm definitely in favour of using that if we're going to end up adding a recursive option to rmdir.

@jasnell
Copy link
Member

jasnell commented Jun 14, 2019

FWIW, according to the LICENSE file, we already incorporate ISC-licensed stuff in the code base. (Still worth checking, I suppose....)

Ah right right. Sorry, scratch what I've said about ISC, I was mentally confusing it with the Artistic License 2 that npm uses. ISC is perfectly fine but the other technical issues still stand :)

@iansu
Copy link
Contributor Author

iansu commented Jun 14, 2019

@jasnell One issue at a time... 🙂

@ljharb
Copy link
Member

ljharb commented Jun 14, 2019

New core modules should NOT be added, pending the outcome of #21552.

Separately this is a filesystem action - conceptually imo it belongs on fs, whether it’s a separate function or an option.

@ForsakenHarmony
Copy link

The link to rimraf in the original post is broken btw

Also I agree on not getting another core module

The recurse option would be nice, alternatively something like rmdir{Tree,Recurse,...} analogous to the Sync postfix (copy being the same that came to mind when I first saw this because I needed a recursive version at the time)

Copy link
Contributor

@MylesBorins MylesBorins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm similarly -1 on a new module... I think this makes a ton of sense as an option to fs.rmdir. We could still vendor rimraf to do so and use that code path when the option is passed.

I'd prefer to see vendored external code done explicitly, similar to how we've vendored eslint. Please take a look at my fetch PR for an example of how to vendor including updating the license and providing a script for future updates

#27979

@@ -0,0 +1,310 @@
'use strict';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to vendor rimraf could we do it similarly to how we've vendored other modules. Please look at the experimental fetch PR for an example of how to do this (including the license)

#27979

@iansu
Copy link
Contributor Author

iansu commented Jun 18, 2019

I'm not opposed to moving this into fs but I personally think rmdir is the wrong place for it. The shell version of rmdir does not delete files. Making the Node version do that might surprise some people.

@targos
Copy link
Member

targos commented Jun 18, 2019

What about we call it rm or remove and like the shell version let it remove files or directories, with an recursive option?

@iansu
Copy link
Contributor Author

iansu commented Jun 18, 2019

@targos unlink is essentially the Node version of rm I think. If we were to add a new function with a recursive option we would also need to add a non-recursive version which is just unlink. That's why I like something like rmtree because it's (maybe?) obvious from the name that it's recursive.

@iansu
Copy link
Contributor Author

iansu commented Jun 18, 2019

@MylesBorins The idea of vendoring rimraf is interesting. We did make a couple of changes to rimraf though, mainly removing globbing and replacing arg checks using assert with throwing the proper exceptions.

I think long term the thinking was that rimraf would continue to exist to provide globbing and a CLI but would use the underlying Node functionality on supported versions.

@silverwind
Copy link
Contributor

I like the idea of a fs.rm. We would be somewhat in line with the shell commands that way. If we strictly follow shell, recursive should be off by default. In a similar vein, we could later introduce fs.cp.

Regarding fork vs. vendoring: I think we want to fork given the reasons above as well as to open the way for further optimization (like removing stat calls).

@bcoe
Copy link
Contributor

bcoe commented Jun 21, 2019

Regarding Small Core

@sam-github I care deeply about keeping encouraging a vibrant npm ecosystem.

I'm in @sindresorhus' camp however, that there's a small minority of utility modules (rmraf, mkdirp, fetch, maybe glob) that would make sense in the standard toolbox:

a tooling author almost always needs these dependencies (which has lead to dependency counts in the millions if you look on GitHub):

  • rimraf, 2,800,000.
  • mkdirp, 2,790,000.
  • glob, 2,900,000.

I don't think these numbers are a measure of popularity (in the same way that the community knows about a module like React, or WebPack) it's a measure of the fact that these modules are core building blocks for JavaScript infrastructure.

Given that @isaacs is supportive of moving rimraf into core, I'd advocate it's a good idea; I think we should take on this maintenance burden, it's valuable to the community.

Regarding Naming Things

It sounds like folks are more in favor of an fs.rmtree than a shutil module, why don't we start with:

  • fs.rmtree.
  • fs.rmtreeSync.

And we can revisit whether we add a recursive setting to rmdir in the future.

@iansu
Copy link
Contributor Author

iansu commented Jun 21, 2019

@bcoe I'm in favour of moving ahead with fs.rmtree and fs.rmtreeSync. We can also provide a promisified version like other functions in fs.

@sindresorhus
Copy link

And we can revisit whether we add a recursive setting to rmdir in the future.

I don't think Node.js should have multiple ways to do the exact same thing. So we should go with either a recursive option or fs.rmtree. I would still prefer the option, but I'm totally fine with fs.rmtree if that's what it takes to get it into core.


There should be one-- and preferably only one --obvious way to do it. - Zen of Python.

@MylesBorins
Copy link
Contributor

I'm still very bullish on making it an option. I also have significant reservations about doing a fork over vendoring. There obviously are extra bits that we may not want in core... but it is a much larger maintenance burden for us to keep that up to date and fix bugs, whereas if we simply vendor we can just work upstream and upgrade as neccessary.

@iansu
Copy link
Contributor Author

iansu commented Jul 2, 2019

I spent some time this weekend working on vendoring rimraf, which mostly involved shamelessly copying @MylesBorins node-fetch PR. I ran into some issues though, mainly the fact that rimraf has a dependency on glob. We could also vendor glob but then it has several other dependencies, and so on it goes.

Is there a strategy for handling this kind of thing with other vendored packages? Another option would be to patch rimraf after we pull it in to remove glob. Before I move ahead I'm curious to know what people think about either of these approaches.

I should also mention that I'm planning to move rimraf into fs.rimraf and drop the new top-level module shutill. That was another point of contention with this PR so hopefully that satisfies those concerns.

@sindresorhus
Copy link

Another option would be to patch rimraf after we pull it in to remove glob.

👍

I should also mention that I'm planning to move rimraf into fs.rimraf

I don't think rimraf is a good name (I'm not alone). It should either be the recursive option or fs.rmtree as discussed previously. (Even fs.rmrf would be better than rimraf).

@bcoe
Copy link
Contributor

bcoe commented Jul 2, 2019

I don't think rimraf is a good name (I'm not alone). It should either be the recursive option or fs.rmtree

@sindresorhus python uses rmtree, as does ruby, I think this is a reasonable compromise (even if underneath the hood the vendored package is rimraf).

Another option would be to patch rimraf after we pull it in to remove glob.

Let's work with @isaacs' and see if we can support this in an elegant way; I'm wondering if we could refactor rmraf in such a way that we can maintain a no-glob tag (npm i rimraf@no-glob), but still be pretty much on the mainline module.

A good place to get @isaacs' attention might be in this slack.

@iansu
Copy link
Contributor Author

iansu commented Jul 2, 2019

I personally prefer the name rmtree. I wasn't sure if we had to keep it the same if we were vendoring rimraf.

Not having to make a patch and keep it updated is appealing. I will try reaching out to @isaacs on Slack.

@iansu
Copy link
Contributor Author

iansu commented Aug 14, 2019

@isaacs published a new version of rimraf yesterday with glob as an optional dependency. Thanks @isaacs and @bcoe for making that happen! ❤️

I've updated this PR to vendor rimraf and wrap it as fs.rmtree and fs.rmtreeSync.

If people are happy with this approach I'll be working on porting some of the existing rimraf tests over with help from my Node.js Mentorship Program mentee @mpaktiti.

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with vendoring in rimraf, but I think it should be a one time thing.

  • rimraf doesn't seem to require a lot of maintenance. It doesn't look like any of the changes in ~2 years would require changes to the vendored code.
  • The source code is only 300-400 lines of JavaScript, which I expect the core team could easily maintain. If bugs are fixed in the rimraf module, I'm sure they would be cherry-picked.
  • My biggest complaint - ongoing vendoring of rimraf would give a slightly inconsistent user experience, mainly around error handling. We've put in a lot of effort to rolling out a new error system, which wouldn't be part of the vendored code. We could also eliminate dead code, such as the handling of the glob dependency.

If no one wants to do the porting of the rimraf codebase to better fit into core, I'll happily take a shot at it.

@iansu
Copy link
Contributor Author

iansu commented Aug 18, 2019

@cjihrig porting rimraf to Node core was actually the approach I originally took in this PR. There was some criticism of that (I think rightly so). It's mostly in this thread but I think the main arguments were:

  • Taking on the maintenance burden of rimraf was undesirable (even if it is small)
  • rimraf is going to have to exist for a long time so two versions of the code will need to be maintained, bug fixes ported, etc.

As far as this being a separate function or an option to rmdir I'm not strongly opinionated. My only argument against adding it as option is that in the unix world rmdir is not the function that recursively deletes files and directories. I could see that creating some (possibly dangerous) confusion.

@iansu
Copy link
Contributor Author

iansu commented Aug 23, 2019

Since #29168 has been merged I'm going to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. semver-minor PRs that contain new features and should be released in the next minor version. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.