Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: optimize realpath using uv_fs_realpath() #3594

Closed
wants to merge 1 commit into from

Conversation

jhamhader
Copy link
Contributor

If cache is not used, use the native uv_fs_realpath() which is faster
then the JS implementation by a few orders of magnitude.

See #2680

@mscdex mscdex added the fs Issues and PRs related to the fs subsystem / file system. label Oct 30, 2015
@bnoordhuis
Copy link
Member

Cross-referencing libuv/libuv#531 which is still open and has some unresolved issues.

I'm not sure if this is a zero risk change. fs.realpathSync() is used by require(). If there are any discrepancies between the current implementation and realpath(3), that can cause regressions in the module loader.

@Fishrock123
Copy link
Contributor

I'm not sure if this is a zero risk change. fs.realpathSync() is used by require(). If there are any discrepancies between the current implementation and realpath(3), that can cause regressions in the module loader.

cc @bmeck who may have some insight

@bmeck
Copy link
Member

bmeck commented Oct 30, 2015

I would say this is probably safe on *nix, but not Windows for now

  • on *nix PATH_MAX will become the maximum string size, only important for
    virtual file systems
  • unsure if it will muck with filenames on windows due to the
    case-insensitive filesystem
  • fs: add uv_fs_realpath() libuv/libuv#531 needs to fix Windows XP to land
    still

On Fri, Oct 30, 2015 at 9:18 AM, Jeremiah Senkpiel <notifications@github.com

wrote:

I'm not sure if this is a zero risk change. fs.realpathSync() is used by
require(). If there are any discrepancies between the current
implementation and realpath(3), that can cause regressions in the module
loader.

cc @bmeck https://github.com/bmeck who may have some insight


Reply to this email directly or view it on GitHub
#3594 (comment).

@jhamhader
Copy link
Contributor Author

Updated realpath() and realpathSync() to use JS implementation when using Windows XP.

@Fishrock123
Copy link
Contributor

@jhamhader
Copy link
Contributor Author

Well, this depends on libuv/libuv#531

@saghul
Copy link
Member

saghul commented Dec 5, 2015

The libuv bits landed. I suggest this one is postponed until Windows XP is dropped so Node doesn't have to care about uv_fs_realpath not working on that platform.

@ChALkeR
Copy link
Member

ChALkeR commented Dec 14, 2015

@saghul It seems to me that 5.x is branched, and XP support will be dropped in 6.0 (current master): #3804. That given, I see no obstacles for relying on uv_fs_realpath in master.

@ChALkeR
Copy link
Member

ChALkeR commented Dec 14, 2015

@saghul Ah, on the other hand, perhaps XP support being dropped should be documented first.

@saghul
Copy link
Member

saghul commented Dec 14, 2015

@ChALkeR yeah, I was thinking about 6.0. It can be documented as a feature! "Removed baggage, aka Windows XP" :-P

On a more serious note, master has libuv 1.8.0 now, so is probably a good time to rebase and give it a go on the CI? @jhamhader

@jhamhader
Copy link
Contributor Author

The current code sends Windows XP to the original JS implementation of realpath().
Should the PR rely on the assumption that upon its merge, Windows XP will be dropped? Or are we to handle Windows XP?

@saghul
Copy link
Member

saghul commented Dec 14, 2015

@jhamhader my undestanding is that Node 6.0 would drop XP (see #3804) so IMHO you can rely on that.

@jhamhader
Copy link
Contributor Author

Ready for CI

@saghul
Copy link
Member

saghul commented Dec 21, 2015

@jhamhader
Copy link
Contributor Author

Fixing the tests to accommodate with Windows case insensitive paths.

@jhamhader
Copy link
Contributor Author

Fixed

@saghul
Copy link
Member

saghul commented Dec 21, 2015

@jhamhader
Copy link
Contributor Author

Fixed linter errors (my bad! :\ )

Any idea why Windows 10 fails with on test-debug-no-context.js:

  ...
not ok 40 test-debug-no-context.js
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: 3221225477 === 0
#     at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vs2015\label\win10\test\parallel\test-debug-no-context.js:15:10)
#     at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vs2015\label\win10\test\common.js:401:15)
#     at emitTwo (events.js:88:13)
#     at ChildProcess.emit (events.js:173:7)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)

@saghul
Copy link
Member

saghul commented Dec 22, 2015

No idea, sorry.

There is one thing that worries me a bit: we are only using the "fast" variant when the cache is not used, which means that we have to keep 2 implementations which basically do the same, right? Was the cache added for speed? If so, would it be ok to drop it now? (that would be semver major, I guess)

Thoughts @nodejs/collaborators ?

@mscdex
Copy link
Contributor

mscdex commented Dec 22, 2015

3221225477 means an access violation/segfault occurred. I saw that test failure the other week but haven't been seeing it lately (and I haven't seen it happen locally in my Windows VM). I haven't looked into the cause though.

@trevnorris
Copy link
Contributor

@saghul honestly the call to uv_fs_realpath() is probably going to be faster that going through the complicated cache mechanism. we should do some benchmarking, and if it's not (as I suspect) then let's just dump it.

EDIT: Would it not work to simply always make the call to node::RealPath() immediately?

jasnell added a commit that referenced this pull request Apr 26, 2016
The following significant (semver-major) changes have been made since the
previous Node v5.0.0 release.

* Buffer
  * New Buffer constructors have been added
    [#4682](#4682)
  * Previously deprecated Buffer APIs are removed
    [#5048](#5048),
    [#4594](#4594)
  * Improved error handling [#4514](#4514)
* Cluster
  * Worker emitted as first argument in 'message' event
    [#5361](#5361).
* Crypto
  * Improved error handling [#3100](#3100),
    [#5611](#5611)
  * Simplified Certificate class bindings
    [#5382](#5382)
  * Improved control over FIPS mode
    [#5181](#5181)
  * pbkdf2 digest overloading is deprecated
    [#4047](#4047)
* Dependencies
  * Reintroduce shared c-ares build support
    [#5775](#5775).
  * V8 updated to 5.0.71.31 [#6111](#6111).
* DNS
  * Add resolvePtr API to query plain DNS PTR records
    [#4921](#4921).
* Domains
  * Clear stack when no error handler
  [#4659](#4659).
* File System
  * The `fs.realpath()` and `fs.realpathSync()` methods have been updated
    to use a more efficient libuv implementation. This change includes the
    removal of the `cache` argument and the method can throw new errors
    [#3594](#3594)
  * FS apis can now accept and return paths as Buffers
    [#5616](#5616).
  * Error handling and type checking improvements
    [#5616](#5616),
    [#5590](#5590),
    [#4518](#4518),
    [#3917](#3917).
  * fs.read's string interface is deprecated
    [#4525](#4525)
* HTTP
  * 'clientError' can now be used to return custom errors from an
    HTTP server [#4557](#4557).
* Modules
  * Current directory is now prioritized for local lookups
    [#5689](#5689)
  * Symbolic links are preserved when requiring modules
    [#5950](#5950)
* Net
  * DNS hints no longer implicitly set
    [#6021](#6021).
  * Improved error handling and type checking
    [#5981](#5981),
    [#5733](#5733),
    [#2904](#2904)
* OS X
  * MACOSX_DEPLOYMENT_TARGET has been bumped up to 10.7
    [#6402](#6402).
* Path
  * Improved type checking [#5348](#5348).
* Process
  * Introduce process warnings API
    [#4782](#4782).
  * Throw exception when non-function passed to nextTick
    [#3860](#3860).
* Readline
  * Emit key info unconditionally
    [#6024](#6024)
* REPL
  * Assignment to `_` will emit a warning.
    [#5535](#5535)
* Timers
  * Fail early when callback is not a function
    [#4362](#4362)
* TLS
  * Rename 'clientError' to 'tlsClientError'
    [#4557](#4557)
  * SHA1 used for sessionIdContext
    [#3866](#3866)
* TTY
  * Previously deprecated setRawMode wrapper is removed
    [#2528](#2528).
* Util
  * Changes to Error object formatting
    [#4582](#4582).
* Windows
  * Windows XP and Vista are no longer supported
    [#5167](#5167),
    [#5167](#5167).
jasnell added a commit that referenced this pull request Apr 26, 2016
The following significant (semver-major) changes have been made since the
previous Node v5.0.0 release.

* Buffer
  * New Buffer constructors have been added
    [#4682](#4682)
  * Previously deprecated Buffer APIs are removed
    [#5048](#5048),
    [#4594](#4594)
  * Improved error handling [#4514](#4514)
* Cluster
  * Worker emitted as first argument in 'message' event
    [#5361](#5361).
* Crypto
  * Improved error handling [#3100](#3100),
    [#5611](#5611)
  * Simplified Certificate class bindings
    [#5382](#5382)
  * Improved control over FIPS mode
    [#5181](#5181)
  * pbkdf2 digest overloading is deprecated
    [#4047](#4047)
* Dependencies
  * Reintroduce shared c-ares build support
    [#5775](#5775).
  * V8 updated to 5.0.71.31 [#6111](#6111).
* DNS
  * Add resolvePtr API to query plain DNS PTR records
    [#4921](#4921).
* Domains
  * Clear stack when no error handler
  [#4659](#4659).
* File System
  * The `fs.realpath()` and `fs.realpathSync()` methods have been updated
    to use a more efficient libuv implementation. This change includes the
    removal of the `cache` argument and the method can throw new errors
    [#3594](#3594)
  * FS apis can now accept and return paths as Buffers
    [#5616](#5616).
  * Error handling and type checking improvements
    [#5616](#5616),
    [#5590](#5590),
    [#4518](#4518),
    [#3917](#3917).
  * fs.read's string interface is deprecated
    [#4525](#4525)
* HTTP
  * 'clientError' can now be used to return custom errors from an
    HTTP server [#4557](#4557).
* Modules
  * Current directory is now prioritized for local lookups
    [#5689](#5689)
  * Symbolic links are preserved when requiring modules
    [#5950](#5950)
* Net
  * DNS hints no longer implicitly set
    [#6021](#6021).
  * Improved error handling and type checking
    [#5981](#5981),
    [#5733](#5733),
    [#2904](#2904)
* OS X
  * MACOSX_DEPLOYMENT_TARGET has been bumped up to 10.7
    [#6402](#6402).
* Path
  * Improved type checking [#5348](#5348).
* Process
  * Introduce process warnings API
    [#4782](#4782).
  * Throw exception when non-function passed to nextTick
    [#3860](#3860).
* Readline
  * Emit key info unconditionally
    [#6024](#6024)
* REPL
  * Assignment to `_` will emit a warning.
    [#5535](#5535)
* Timers
  * Fail early when callback is not a function
    [#4362](#4362)
* TLS
  * Rename 'clientError' to 'tlsClientError'
    [#4557](#4557)
  * SHA1 used for sessionIdContext
    [#3866](#3866)
* TTY
  * Previously deprecated setRawMode wrapper is removed
    [#2528](#2528).
* Util
  * Changes to Error object formatting
    [#4582](#4582).
* Windows
  * Windows XP and Vista are no longer supported
    [#5167](#5167),
    [#5167](#5167).
@isaacs
Copy link
Contributor

isaacs commented Apr 27, 2016

One concern: if node's fs.realpath will start throwing on systems that lack PATH_MAX, then now we have to handle that, which is pretty inconvenient.

How bad would it be to fall back to the JS implementation if libuv can't do realpath safely?

@isaacs
Copy link
Contributor

isaacs commented Apr 27, 2016

Re the issue in node-glob, I consider throwing at different levels to be no big deal, and will adjust the tests accordingly. Bottom line, it'll eventually raise an error if it's following symlinks through a loop, which is the point.

The important thing is that node's fs.realpath is safe, predictable, and fast. Imo, "safe, predictable, fast" is the right priority order. Removing realpath from systems that used to have it is pretty disruptive, and reduces predictability. Having a buffer overflow vuln on those systems is unsafe, so even worse.

@addaleax
Copy link
Member

@isaacs libuv/libuv#843 makes libuv not even build on systems without PATH_MAX, fwiw. And no such platform has been found so far.

@MylesBorins
Copy link
Contributor

it is worth mentioning that @bnoordhuis brought up some concerns, especially regarding freeBSD not having to have PATH_MAX defined

There is a possibility that the commit could be reverted.

@Fishrock123
Copy link
Contributor

I would prefer to attempt to work around edge cases rather than revert outright. The speed improvement here is actually really important to some users. (iirc that originally brought up this possibility!)

@trevnorris
Copy link
Contributor

trevnorris commented Apr 27, 2016

@Fishrock123 You're correct that performance was how this whole thing came about. Here's the original issue posted by @joliss about issue with the Broccoli build tool because of the excessive slowness: nodejs/node-v0.x-archive#7902 (original post where I first suggested using realpath(3) where supported nodejs/node-v0.x-archive#7902 (comment)).

The new performance gains are much better, but still not where they could be. Haven't had time to take an in depth look, but definitely agree with @Fishrock123 that we should take a look at fixing the issue instead of doing a full revert because of them.

@jasnell
Copy link
Member

jasnell commented Apr 27, 2016

+1... While reverting should be an option on the table if it proves
necessary, we should seek to find an alternative before doing so.
On Apr 27, 2016 4:14 PM, "Trevor Norris" notifications@github.com wrote:

@Fishrock123 https://github.com/Fishrock123 You're correct that
performance was how this whole thing came about. Here's the original issue
posted by @joliss https://github.com/joliss about issue with the
Broccoli build tool because of the excessive slowness:
nodejs/node-v0.x-archive#7902
nodejs/node-v0.x-archive#7902 (original post
where I first suggested using realpath(3) where supported nodejs/node-v0.x-archive#7902
(comment)
nodejs/node-v0.x-archive#7902 (comment)).
At the bottom is the link to the PR, and its 7 month journey, to finally
landing.

The new performance gains are much better, but still not where they could
be. Haven't had time to take an in depth look, but definitely agree with
@Fishrock123 https://github.com/Fishrock123 that we should take a look
at fixing the issue instead of doing a full revert because of them.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3594 (comment)

MylesBorins pushed a commit to MylesBorins/node-glob that referenced this pull request Jun 6, 2016
fs.realpath has been optimized to utilize `uv_fs_realpath()`, a new api
in libuv. The libuv api has slightly different behavior. There are two
particular issues that are causing problems for glob

* `ev_fs_realpath()` throwing before `fs.readdir` ELOOP
* The removal of the cache

In the past glob was relying on `fs.readdir` to throw when handling
recursive symbolically linked directories. It is now possible that the
call to libuv can throw before. The behavior is currently different on
osx and linux causing the test suite to fail on all of the flavors of
linux we are running in CI.

This commit adds an extra check for 'ELOOP' and sets appropriately.

The commit includes an extra check to make sure that `real` is truthy
Without that extra check the test suite was reporting garbage data in the results.
This was only happening with the async implementation.

This commit has not done anything regarding the removal of the cache
As long as the cache doesn't include a value called "encoding"
everything will be fine.

There has not been any benchmarking done on this change.

ref: libuv/libuv#531
ref: nodejs/node#3594
saghul added a commit to saghul/node that referenced this pull request Jul 11, 2016
Fixes: nodejs#5737
Fixes: nodejs#4643
Fixes: nodejs#4291
Fixes: nodejs/node-v0.x-archive#8960
Refs: nodejs#3594
PR-URL: nodejs#5994
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit that referenced this pull request Jul 11, 2016
Fixes: #5737
Fixes: #4643
Fixes: #4291
Fixes: nodejs/node-v0.x-archive#8960
Refs: #3594
PR-URL: #5994
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit that referenced this pull request Jul 11, 2016
Fixes: #5737
Fixes: #4643
Fixes: #4291
Fixes: nodejs/node-v0.x-archive#8960
Refs: #3594
PR-URL: #5994
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
Fixes: #5737
Fixes: #4643
Fixes: #4291
Fixes: nodejs/node-v0.x-archive#8960
Refs: #3594
PR-URL: #5994
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@saghul saghul mentioned this pull request Jul 14, 2016
7 tasks
MylesBorins pushed a commit that referenced this pull request Jul 14, 2016
Fixes: #5737
Fixes: #4643
Fixes: #4291
Fixes: nodejs/node-v0.x-archive#8960
Refs: #3594
PR-URL: #5994
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit that referenced this pull request Jul 14, 2016
Fixes: #5737
Fixes: #4643
Fixes: #4291
Fixes: nodejs/node-v0.x-archive#8960
Refs: #3594
PR-URL: #5994
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.