Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] realpath issues in v6 #7726

Closed
7 tasks
saghul opened this issue Jul 14, 2016 · 54 comments
Closed
7 tasks

[meta] realpath issues in v6 #7726

saghul opened this issue Jul 14, 2016 · 54 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. meta Issues and PRs related to the general management of the project.

Comments

@saghul
Copy link
Member

saghul commented Jul 14, 2016

Overview

There are a number of problems related to the realpath change in #3594. This issue aims to serve a single entry point to those, with the hope that we have a clear picture and are able to address them one way or another.

Please: Do not report new issues here, this is a collection of the currently existing ones.

Current state

Bad. Some of these issues have PRs addressing a specific issue, but none fix all of them.

Original reasoning

Following the issue trail I can only see performance as the argument for creating uv_fs_realpath and using it in node:

Way forward(?)

After scratching my head for a while I have no other suggestion than to revert back to the JS implementation. Since we changed the API to remove the cache argument, which was needed for performance, I suggest we keep the cache private, thus avoiding yet another API change.

I'll update the libuv documentation so nobody tries to bring this up again in Node v103.x, or at least not without validating it solves all the aforementioned problems.

Suggestions / alternatives are more than welcome, but at this point it's all or nothing, we either have a plan for fixing all the problems or we revert, IMHO.

@saghul saghul added fs Issues and PRs related to the fs subsystem / file system. ctc-agenda meta Issues and PRs related to the general management of the project. labels Jul 14, 2016
@saghul
Copy link
Member Author

saghul commented Jul 14, 2016

/cc @nodejs/collaborators

@ronkorving
Copy link
Contributor

Are the proposed fixes that are currently open workarounds or proper solutions? Also, could you link to them just like you did the issues so we can have a look into them?

@mscdex
Copy link
Contributor

mscdex commented Jul 14, 2016

Why was this tagged ctc-agenda again?

@saghul
Copy link
Member Author

saghul commented Jul 14, 2016

@mscdex I haven't gone through the last discussions the CTC had about this, but IIRC the plan was to keep the current implementation and fix it. I argue it cannot. Feel free to remove it if you feel otherwise.

@saghul
Copy link
Member Author

saghul commented Jul 14, 2016

@ronkorving they are linked on the issues themselves AFAIK.

@trevnorris
Copy link
Contributor

I have a fix for the linux ELOOP issue. It makes sense to me that we keep that implementation around. Performance is noticeably better (been taking advantage of this recently in an app that's getting the realpath of > 50,000 files as quickly as possible).

@saghul
Copy link
Member Author

saghul commented Jul 14, 2016

@trevnorris Unfortunately, that's not enough. It's unacceptable IMHO that Windows people can't even run a hello world file if it's on one of those weird devices or network shares. I'm not opposed to some _realpath method somewhere, for those who want it though.

@addaleax
Copy link
Member

@saghul What would you think of implementing #7559 in more general way that would cover all of the Windows problems listed above? It is currently based on bailing out to the original JS implementation when encountering an error in the libuv-based realpath (currently, only upon certain kinds of errors, but it’s definitely written in a way that would make it easy to add more conditions).

That way forward has come up there and in #7175; it should work consistently and retain the performance gains in the most common (by far) cases.

@rvagg
Copy link
Member

rvagg commented Jul 14, 2016

Thanks for this @saghul, it's helpful to have a roll-up like this and I'm very much inclined to view it in the same way as you at this stage. Without satisfactory resolution for Windows users we're stuck in a corner, regardless of progress papering over ELOOP.

I see two alternatives here:

  1. Revert entirely (I'd actually +1 the re-addition of cache, it shouldn't have been removed without a deprecation cycle in the first place, especially when it was replaced with an argument of the same type!)
  2. Revert for Windows and apply @trevnorris' fixes for everything else. Or something like that, perhaps as per win, fs: fix realpath behavior on substed drives #7559 as @addaleax says above.

What is the libuv stance on its new realpath implementation? Given that this issue is an admission that it's essentially broken (for now), is it going to stay around? Does it make sense for libuv to continue shipping something that is known broken for Windows?

@addaleax
Copy link
Member

Given that this issue is an admission that it's essentially broken (for now), is it going to stay around?

Or maybe the other way around: Would you accept a port of whatever solution ultimately ends up in Node to libuv? It’s something I’d definitely be interested in if you think it makes sense.

@saghul
Copy link
Member Author

saghul commented Jul 14, 2016

What is the libuv stance on its new realpath implementation? Given that this issue is an admission that it's essentially broken (for now), is it going to stay around? Does it make sense for libuv to continue shipping something that is known broken for Windows?

Given our versioning scheme, we can't remove it. Now, I also wouldn't consider it broken (on Unix): macOS and some BSDs have this systemwide 32 symlinks limit and nobody complains about it except the Node ecosystem, so one could argue Node was abusing this. And I would agree with that. This can be fixed, however, by taking the realpath(3) implementation in OpenSSH-portable (for example) and removing the limit. On Linux we could even copy the musl implementation, which is really simple and elegant.

But! Windows.

Or maybe the other way around: Would you accept a port of whatever solution ultimately ends up in Node to libuv? It’s something I’d definitely be interested in if you think it makes sense.

Certainly.

On Windows there is no realpath(3), so we are trying to emulate it, albeit without much success. It's possible that a way to solve all this is found later on, in which case we could go back to libuv's implementation, but know what to test for. That's partially the reason why I suggest we keep the current API, so swapping the implementation is possible. Also, those who know they are not affected by any of these shortcomings could use some "node-uvrealpath" addon and monkeypatch fs.realpath for their purposes.

@saghul
Copy link
Member Author

saghul commented Jul 14, 2016

Revert entirely (I'd actually +1 the re-addition of cache, it shouldn't have been removed without a deprecation cycle in the first place, especially when it was replaced with an argument of the same type!)

The cache allowed for some very bizarre and unexpected path substitutions, IIRC that was the reason for its removal. If nobody misses the cache now, I argue we are ok without it. Since the JS version needs a cache, however, we will have to have one, but not as part of the API, IMHO.

@trevnorris
Copy link
Contributor

@saghul I'm curious, the code to transverse the filesystem in the JS implementation is fairly straight forward. What if that same method was ported to C and used for the windows implementation in libuv?

@saghul
Copy link
Member Author

saghul commented Jul 16, 2016

That's a possibility indeed. I guess the cache part would be a bit problematic. As in: have it or not? Store it where? etc.

@trevnorris
Copy link
Contributor

@saghul I'd say for initial implementation we not have the cache (meaning accepting a cache from the user). From there we can do benchmarks and see how it affects performance. As for libuv generating a cache, isn't that technically a race condition? Cache coming from the user means they are aware how the paths will be resolved, but if a path changes in between two calls then internal cache could screw with us.

@saghul
Copy link
Member Author

saghul commented Jul 19, 2016

@trevnorris Sure, no cache is fine. realpath(3) as implemented by glibc doesn't have one either, for example.

Now, if we are going to have a full-custom realpath implementation for Windows, we might as well do that on Unix, fixing the ELOOP problems and making the Linux version faster as byproducts.

At any rate, my intent with this issue is to reach a stable state ASAP, and I think a revert is still the best idea for now. We now know where we failed short, so back to the drawing board. If we do it as I suggested (no API changes, internal cache) swapping it out when we have a solid alternative would not imply any API changes, it would just be a performance boost.

One of the Windows issues is particularly hairy: #7559. Some people have used substed drives to make paths shorter and have some tools not choke on that. We'd have to agree on what we do there.

@trevnorris
Copy link
Contributor

I wouldn't mind doing a full revert for the moment if the libuv fix and update in node isn't going take until v7. That delay is my only worry for the revert.

@saghul
Copy link
Member Author

saghul commented Jul 20, 2016

I don't know when a libuv fix will be available, or if at all.

@rvagg
Copy link
Member

rvagg commented Jul 20, 2016

Yes please, I'm still in favour of a full revert, cache arg included, before we head in to v6 LTS then we can revisit in master for v7.

@evanlucas
Copy link
Contributor

I'm +1 on a full revert as well at this point.

@saghul
Copy link
Member Author

saghul commented Jul 20, 2016

@rvagg Bringing back a public cache would mean we might need to remove it again. I have not seen any issue about that cache being removed. At the risk of repeating myself: if we keep the cache hidden we can change the guts without making API changes. Hopefully.

@bzoz
Copy link
Contributor

bzoz commented Jul 20, 2016

+1 for full revert

BTW, do we really need/want to remove the cache? I would guess it is useful in some situations, and passing it as an argument is a nice way to give user control over it.

@saghul
Copy link
Member Author

saghul commented Jul 20, 2016

@bzoz if we bring back the cache argument we'd change the API once again. I'd be interested in knowing on which non-contrived situations it's actually useful. It's removed now, and nobody complained about it AFAIK.

@bzoz
Copy link
Contributor

bzoz commented Jul 20, 2016

@saghul It seems it was added to increase module loader performance: 9bed5dc. @isaacs or @ry should probably know if we really need this.

@addaleax
Copy link
Member

It seems it was added to increase module loader performance

I would say “yes”, the module loader is the (possibly only) place where a cache is necessary, and it would be needed in order to do a full revert; maybe we could reintroduce that as options.cache.

Although I think a full revert sounds okay to me, I’d still like to hear if there’s anything that would remain broken when taking the following path:

@jasnell
Copy link
Member

jasnell commented Jul 26, 2016

I'm sorry for coming into this thread a bit late, just getting back from vacation and catching up... At this point I am inclined to revert the changes to realpath entirely and move the new impl to a new method... fs.lrealpath() for instance (short for libuv real path). We can then bring fs.realpath() through a proper deprecation cycle.

@saghul
Copy link
Member Author

saghul commented Jul 26, 2016

@jasnell lrealpath is too similar to the stat -> lstat, chown -> lchown convention. At this point I'd not expose it. If someone tryly wants it they can write a simple addon exposing it.

@saghul
Copy link
Member Author

saghul commented Jul 26, 2016

@orangemocha thanks, I updated the list at the top. What a mess :-S

@jasnell
Copy link
Member

jasnell commented Jul 26, 2016

The key challenge with that is that there may be developers who have since
come to depend on the new impl. I'd much rather not drop it entirely.
Exposing both is not problematic if we have a clear deprecation path.

On Tuesday, July 26, 2016, Saúl Ibarra Corretgé notifications@github.com
wrote:

@jasnell https://github.com/jasnell lrealpath is too similar to the
stat -> lstat, chown -> lchown convention. At this point I'd not expose it.
If someone tryly wants it they can write a simple addon exposing it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7726 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAa2eY4kdbzjxhiejF4r_jeOvD05635xks5qZj69gaJpZM4JMKMc
.

@MylesBorins
Copy link
Contributor

@jasnell is there a clear way that we can signal to individuals all the problems associated with this implementation? Should we implement the fallback to the original implementation on failure?

@jasnell
Copy link
Member

jasnell commented Jul 26, 2016

We document the differences and leave it at that. Fallbacks get messy.

On Tuesday, July 26, 2016, Myles Borins notifications@github.com wrote:

@jasnell https://github.com/jasnell is there a clear way that we can
signal to individuals all the problems associated with this implementation?
Should we implement the fallback to the original implementation on failure?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7726 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAa2eTNGjKpKyB8kOmP82FPWPg_q2cJtks5qZkFSgaJpZM4JMKMc
.

@saghul
Copy link
Member Author

saghul commented Jul 26, 2016

The key challenge with that is that there may be developers who have since come to depend on the new impl. I'd much rather not drop it entirely.

There is nothing on the new implementation that can be depended upon. Unless being broken is such thing. I say this as someone who has followed this since the start (I reviewed the libuv PR) and came to the conslusion that we currently have no other reasonable choice.

Exposing both is not problematic if we have a clear deprecation path.

There is no clear path. Exposing something with the intention to deprecate it sounds like a waste of time IMHO.

@saghul
Copy link
Member Author

saghul commented Jul 26, 2016

@thealphanerd please let's not have 2 implementations each broken in a different way, it's just a recipe for dissaster.

I was hoping the last CTC call would have suggested some clear course of action, but it seems we are all as confused as we originally were.

There is no alternative to reverting which fixes all the problems without adding back the original implementation, it's the only viable way forward at this point. We can revisit later. Git archeology suggests the only reason for replacing the implementation was performance. If users could live with it until Node v6 they can certainly live with it a bit longer.

@orangemocha
Copy link
Contributor

I agree that reverting is necessary. The new implementation simply has caused too many breakages, some of which are not just acceptable behavior changes but rather nasty bugs.

Fallbacks get messy, and do double the surface area for bugs and other behavioral differences. It doesn't sound like a desirable approach IMO.

On Windows, the list of bugs is so bad that we have no choice but reverting to the JS implementation. If you think that the Unix libuv implementation is in a better state, we could consider a Windows-specific revert to JS, and keep the libuv implementation for Unix. Note that this doesn't increase the bug surface area with respect to the current implementation, because the libuv implementation is different on Unix vs Windows. It does however leave room for inconsistencies between platforms, so my preference would be to do a full revert to JS on all platforms, and consider ways to improve performance later on.

@saghul
Copy link
Member Author

saghul commented Jul 27, 2016

@orangemocha Agreed, I reached the same conclusion. When I mentioned the bug surface I meant from Node's perspective, considering what libuv internally does an implementation detail. At any rate, it's good to see we are on the same page.

@orangemocha
Copy link
Contributor

Here's a PR to revert: #7899

@jasnell
Copy link
Member

jasnell commented Jul 27, 2016

There is one additional consideration for reverting the realpath implementation in that the new implementation allows an encoding option to be passed to get the path back as a Buffer rather than a string. While it's unlikely that has extensive use by this point in time, it's still technically a major change to pull that back out.

@addaleax
Copy link
Member

Btw, something that would probably be incredibly helpful to have after the revert are regression tests for all the issues we’ve been encountering… /cc @nodejs/platform-windows

@saghul
Copy link
Member Author

saghul commented Jul 28, 2016

@addaleax Yes indeed! Alas, most of the Windows issues depend on "weird" system configurations, so the tests will need to make strong assumptions about the environment.

The ELOOP part can be easily tested, but that's a minor thing overall.

@piscisaureus
Copy link
Contributor

Given that the ultimate purpose of realpath()ing module paths is to avoid loading the same module twice, I wonder if there is a completely different solution to this; what about instead of using the path as the cache key, use the [std_dev, st_ino] pair?

@addaleax
Copy link
Member

addaleax commented Aug 3, 2016

Another purpose of realpath()ing is to allow npm link and similar features of package managers/directory structures; #5950 was an attempt to call realpath() only for the main module, and it turned out to be a pretty bad idea because of that.

@Jeff-Lewis
Copy link

Thanks @saghul and everyone for addressing this.

@saghul
Copy link
Member Author

saghul commented Aug 15, 2016

Time to close this one up. The revert made by @bzoz will be part of #8070, so Node 6.4.0. Thanks everyone who helped get this mess untangled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. meta Issues and PRs related to the general management of the project.
Projects
None yet
Development

No branches or pull requests