Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

fs.realpath 70x slower than native #7902

Closed
joliss opened this issue Jul 5, 2014 · 24 comments
Closed

fs.realpath 70x slower than native #7902

joliss opened this issue Jul 5, 2014 · 24 comments

Comments

@joliss
Copy link

joliss commented Jul 5, 2014

The fs.realpath function is 70x slower than native C realpath. On my system, fs.realpath takes 32 µs, while C realpath takes 0.45 µs.

This is a real problem in the Broccoli build tool, where we need to resolve symlinks in hot code paths. Resolving 1000 symlinked files - not an unusual case - would take 45 ms, slowing down the build considerably. [1]

As for a solution: I haven't looked at the fs.js source in detail, but it seems we might be able to call the realpath function in the C standard library, where available, instead of using our own implementation.

Benchmark code for Node:

var fs = require('fs')

var start = Date.now()
var n = 10000
for (var i = 0; i < n; i++) {
  if (fs.realpathSync('.') === 'dummy') throw new Error('never happens')
}
console.log(((Date.now() - start) * 1000 / n) + ' us') // => 32 us on Node 0.11.13

Benchmark code for C:

#include <limits.h> /* PATH_MAX */
#include <stdio.h>
#include <stdlib.h>

// Adapted from http://stackoverflow.com/a/1563237/525872

int main(void) {
  char buf[PATH_MAX + 1]; /* not sure about the "+ 1" */
  int i;
  for (i = 0; i < 1000000; i++) {
    char *res = realpath(".", buf);
    if (res) {
      // printf("This source is at %s.\n", buf);
    } else {
      perror("realpath");
      exit(EXIT_FAILURE);
    }
  }
  return 0;
}

Run with gcc -std=gnu99 realpath-benchmark.c -o realpath-benchmark && time ./realpath-benchmark. This yields 0.45 µs per iteration on my Linux system.

[1] We cannot work around this by using naïve string concatenation, because path_resolution(7) requires that we resolve symlinks in all path components. Here is a gist to show why this matters.

@joliss
Copy link
Author

joliss commented Jul 5, 2014

Cc: @isaacs, @rsms, @piscisaureus, who seem to have written most of the realpath code.

By the way, there was a comment (now deleted) in the fs.js realpath source: "Not using realpath(2) because it's bad. See: http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html" Regarding that comment and blog post: I wonder if it's acceptable to fail on extremely long path names, as long as we don't cause buffer overflows, if we get an order-of-magnitude performance improvement in return?

@valpackett
Copy link

+1.

I think node should use the native realpath when the path is shorter than PATH_MAX and the custom implementation otherwise. 

BTW, it's only a 13x difference on my MacBook:
197 µs -- node 0.10.29
15.3 µs -- clang 503.0.40 or gcc 4.8.3

@rsms
Copy link

rsms commented Jul 8, 2014

Chiming in here: @arrelid and I spent about a day designing and a couple of days building the asynchronous version of realpath.

My personal opinion:

  • We should keep realpath as-is (unless there's a need for perf improvement and a solid suggestion.)
  • We should modify realpathSync to optimistically use libc realpath, and upon failure fall back to a JS implementation.

Hitting PATH_MAX seems unlikely in the majority of scenarios. If you do hit PATH_MAX, lots and lots of apps and services will fail for you.

@joliss
Copy link
Author

joliss commented Jul 9, 2014

Hm, my guess would be that under most circumstances the synchronous C version is still faster than the asynchronous JS version, since most of the time is spent on CPU. I might certainly be wrong, but it might be worth benchmarking. (I'm not writing the patch here, so please just ignore me if you disagree - it's up to you.)

We should modify realpathSync to optimistically use libc realpath, and upon failure fall back to a JS implementation.

👍

@rsms
Copy link

rsms commented Jul 9, 2014

@joliss You are indeed wrong 😄 — realpath will spend 99% on I/O and a very small amount of time instructing a CPU. In an essence, what realpath does is to query the file system until it has found the canonical path to a particular file, where "canonical" means a hard-link leaf in a "real" (no symlinks) directory tree. Basically a bunch of stat, lstat and readlink kernel calls that reads the file system.

So clearly calling realpathSync on an "unfortunate" path could freeze up your entire NodeJS process for N time where N is anywhere from a couple of microseconds to minutes or even hours (i.e. if part of the path needed to be resolved is on a slow network drive or the local hard drive is busy doing other work.

An asynchronous (or "dispatched" if you will) approach to I/O is a core philosophy on which NodeJS was based.

@bnoordhuis
Copy link
Member

realpath will spend 99% on I/O and a very small amount of time instructing a CPU

I think a lot is going to depend on whether the kernel's dentry cache is hot or not. In the benchmarks that @joliss posted, the dentry for "." is loaded once and then hit over and over again. There's almost no real I/O taking place, nearly all wall clock and CPU time is spent on string operations and marshalling the struct stat into a JS object.

That's easy to optimize for but probably not representative of a real workload. It would be more interesting to see what performance is like when you hit a large range of paths starting from a cold cache (echo 3 > /proc/sys/vm/drop_caches as root on Linux.)

@rsms
Copy link

rsms commented Jul 9, 2014

@bnoordhuis Good point. I assumed the tests were run on uncached paths. Otherwise—as you point out—all the code is doing is essentially javascript object key lookups.

Theoretically, realpath should use a minuscule amount of CPU when resolving a path and spend basically all of its time waiting to be woken up by the kernel.

@joliss
Copy link
Author

joliss commented Jul 16, 2014

I assumed the tests were run on uncached paths.

Yes, my use case here is a CLI tool (Broccoli, for building JavaScript browser apps), where stuff tends to come out of the cache.

The problem we have is with the CPU usage of fs.realpathSync. Even calling it on moderate numbers of files, it eats enough CPU to take up a significant chunk of the rebuild time (we're aiming for ~200 ms total). If we used the C syscall, this problem would basically go away,

@rsms
Copy link

rsms commented Jul 18, 2014

I recommend you submit a patch (pull req or patch to the mailing list) where realpathSync takes the libc naïve approach of assuming <=PATH_MAX.

@trevnorris
Copy link

Here's the flamegraph for fs.realpathSync('.'): https://i.cloudup.com/n9pPZFuyc0.svg

I'll take a look into it sometime.

@jasnell
Copy link
Member

jasnell commented Jun 22, 2015

@trevnorris ... any further updates on this?

@trevnorris
Copy link

Haven't tried latest v0.12, but io.js v2.3.0 now runs this in ~9us. Still not near optimal, but better. I'll take a quick look.

@trevnorris
Copy link

Hm. Difference my be from my box. The native code runs at 120ns/op, while the script on io.js v2.3.0 runs in 3430ns/op (which is less than the 9us I was getting before a small change to the test). Either way, that is significant overhead. Going to take a peek at the code.

@trevnorris
Copy link

That method is substantial. Not really sure what's going on. It's outside the realm of trivial fix, and not exactly sure why we aren't just linking to realpath(3) on supported platforms.

@jasnell I'll let you decide whether to keep this open. Did a comparison on the tip of v0.12 branch and it runs in the same amount of time as io.js v2.3.0. So what I thought was a performance improvement was from a faster hd.

@joliss
Copy link
Author

joliss commented Jun 23, 2015

I agree that linking to realpath(3) (where supported) would be awesome.

@jasnell
Copy link
Member

jasnell commented Jun 24, 2015

Ok, I'm going to mark this as a defer but keep it open. The fix would likely be made over in io.js or the converged repo then backported here.

@stefanpenner
Copy link

@jasnell is the issue safe here, or should we open a new one on a non-archive repo ? I just want to prevent it from being forgotten.

@jasnell
Copy link
Member

jasnell commented Sep 1, 2015

Either way. It's safe here. The only downside is that it's going to be as visible sitting here as a new issue would be over in nodejs/node.

@stefanpenner
Copy link

The only downside is that it's going to be as visible sitting here as a new issue would be over in nodejs/node.

i dont see a downside (or was that the point)

@jasnell
Copy link
Member

jasnell commented Sep 1, 2015

lol.. I mean not going to be as visible sitting here... there aren't that many people looking over these older issues.

@trevnorris
Copy link

IIRC there wasn't any opposition to using realpath(3) directly on supported systems. If the issue is opened just ping me and I'll keep it on the list.

stefanpenner added a commit to stefanpenner/node-symlink-or-copy that referenced this issue Sep 1, 2015
@joliss has already identified nodes performance issues with realpath [here](nodejs/node-v0.x-archive#7902)

As per @krisseldens suggestion, we can likely just use readlinkSync here instead.

real life example: https://github.com/ember-cli/stress-app
before: 8668.970928ms
after: 7152.094867ms

18% incremental build improvement

As realpathSync uses readlinkSync internally, this also reduces calls to readlinkSync
before: 17826 (134.230726ms)
after:   6171 ( 93.061130ms)
@thefourtheye
Copy link

Can this be done in uv?

@stefanpenner
Copy link

@trevnorris should I reopen one different repo?

@jasnell
Copy link
Member

jasnell commented Sep 3, 2015

Opening a new issue or PR over in nodejs/node would likely be good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants