File list performance with symlinks on Windows #633

chrmarti · 2017-10-10T22:10:46Z

I'm investigating a report that listing files (--files) on Windows when following symlinks (--follow) takes much more time than without symlinks.

It appears that in some cases cnpm (an npm replacement) uses symlinks to organize the node_modules folder and that then slows down ripgrep. (microsoft/vscode#35659 (comment) has an example package.json that when installed using cnpm i reproduces the problem.)

Looking at the trace in Process Monitor I see many files being opened for reading metadata in DirEntryRaw:from_link and to detect loops in same_file's Handle::from_path.

One area where this might be improved is where check_symlink_loop() loops over parent directories of each symlinked directory, causing is_same_file to open the same parent directory once for each of its symlinked subdirectories. Maybe a directory's id could be cached for the duration of its content being traversed?

/cc @roblourens

The text was updated successfully, but these errors were encountered:

chrmarti · 2017-10-10T22:12:31Z

From a recording in Process Monitor:

BurntSushi · 2017-10-10T22:21:52Z

Maybe a directory's id could be cached for the duration of its content being traversed?

AFAIK, this is technically incorrect. See this comment here: https://github.com/BurntSushi/same-file/blob/master/src/win.rs#L19-L58

chrmarti · 2017-10-10T22:51:36Z

Maybe the cache could keep the directories open to avoid the filesystem reusing the index numbers. That shouldn't be too expensive, because for each 'cursor' (not sure if it's all threads) in the traversal you'd only need to keep the parent directories open which should be few.

Broadly speaking, this commit is an attempt to fix this issue: BurntSushi/ripgrep#633 It was reported that symlink checking was taking a long amount of time, and that one possible way to fix this was to reduce number of times a file descriptor is opened. In this commit, we amortize opening file descriptors by keeping a file handle open for each ancestor in the directory tree. We also open a handle for the candidate file path at most once, instead of once every iteration. Note that we only perform this optimization on Windows, where opening a file handle seems inordinately expensive. In particular, this now causes us to potentially open more file descriptors than the limit set by the user, which only happens when following symbolic links. We document this behavior.

BurntSushi · 2017-10-21T04:13:01Z

I have an attempt at fixing this here: BurntSushi/walkdir@5bcc5b8 --- It's in the directory traversal library, so I still need to pull it into ripgrep.

Broadly speaking, this commit is an attempt to fix this issue: BurntSushi/ripgrep#633 It was reported that symlink checking was taking a long amount of time, and that one possible way to fix this was to reduce number of times a file descriptor is opened. In this commit, we amortize opening file descriptors by keeping a file handle open for each ancestor in the directory tree. We also open a handle for the candidate file path at most once, instead of once every iteration. Note that we only perform this optimization on Windows, where opening a file handle seems inordinately expensive. In particular, this now causes us to potentially open more file descriptors than the limit set by the user, which only happens when following symbolic links. We document this behavior.

BurntSushi · 2017-10-22T12:11:34Z

This should be fixed in ripgrep 0.7.0. I didn't actually test that it resolves the performance problem, but the symlink loop checking should be opening far fewer file handles. We can definitely re-open this if it's still a problem (and having an easy to use test corpus to try this on would be a huge help, e.g., what steps can I take with cnpm to setup a directory big enough to notice this problem).

BurntSushi · 2017-10-22T14:36:45Z

Unfortunately, I had to partially revert the optimization for the parallel iterator, which is used by default in ripgrep. The problem is that hanging on to file handles in the parallel iterator can cause file descriptor limits to be exceeded quite easily since many directories are being traversed in parallel. I say "partially" revert because we are at least keeping the same child handle open while walking up the ancestor path, but unfortunately, we are still opening a handle to each ancestor in that loop.

If ripgrep is run with a single thread (-j1), then it will use the standard walkdir iterator, which still retains this optimization in full. So we should at least be able to test with that to see if this is the optimization that fixes this issue, and if so, brainstorm solutions for the parallel iterator.

chrmarti · 2017-10-24T17:42:34Z

Great, thanks @BurntSushi! When testing it I see it panicking after a few (maybe unrelated) errors:

PS C:\devel\35659\cipchk> (Measure-Command { & 'C:\devel\ripgrep\target\debug\rg.exe' --files --no-ignore --hidden --follow -j1 -- . | Measure-Object –Line | Out-Default }).TotalSeconds
File system loop found: .\node_modules\babel-core\node_modules\babel-register\node_modules\babel-core points to an ancestor .\node_modules\babel-core
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\dom-collections: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\function\virtual: The system cannot find the path specified. (os error 3)
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\number\virtual: The system cannot find the path specified. (os error 3)
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\string\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\array\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\dom-collections: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\function\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\number\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\string\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\array: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\date: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\dom-collections: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\error: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\function: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\json: The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\map: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\math: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\number: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\object: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\promise: The system cannot find the path specified. (os error 3)
thread 'main' panicked at 'BUG: list/path stacks out of sync', src\libcore\option.rs:819:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

 Lines Words Characters Property
 ----- ----- ---------- --------
243633


11.98369

BurntSushi · 2017-10-24T17:47:05Z

@chrmarti Thanks! Few questions:

Does that happen with ripgrep 0.6.0?
Can you run it with RUST_BACKTRACE=1?
Would it be possible to list out the steps I'd need to follow to reproduce your results?

chrmarti · 2017-10-24T17:53:31Z

This is from running 1aec4b1 . A simple reproducible case on my machine is:

cnpm i babel-preset-es2015
& 'C:\devel\ripgrep\target\debug\rg.exe' --files --follow -j1 -- . | Measure-Object –Line

The full output with this and RUST_BACKTRACE=1 is:

PS C:\devel\35659\panick> & 'C:\devel\ripgrep\target\debug\rg.exe' --files --follow -j1 -- . | Measure-Object –Line
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\dom-collections: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\function\virtual: The system cannot find the path specified. (os error 3)
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\number\virtual: The system cannot find the path specified. (os error 3)
.\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\string\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\array\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\dom-collections: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\function\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\number\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\fn\string\virtual: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\array: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\date: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\dom-collections: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\error: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\function: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\json: The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\map: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\math: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\number: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\object: The system cannot find the path specified. (os error 3)
The system cannot find the path specified. (os error 3)
.\node_modules\babel-plugin-transform-es2015-classes\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-helper-get-function-arity\node_modules\babel-runtime\node_modules\core-js\library\fn\promise: The system cannot find the path specified. (os error 3)
thread 'main' panicked at 'BUG: list/path stacks out of sync', src\libcore\option.rs:819:4
stack backtrace:
   0: std::sys_common::backtrace::_print
             at C:\projects\rust\src\libstd\sys_common\backtrace.rs:94
   1: std::panicking::default_hook::{{closure}}
             at C:\projects\rust\src\libstd\panicking.rs:379
   2: std::panicking::default_hook
             at C:\projects\rust\src\libstd\panicking.rs:396
   3: std::panicking::rust_panic_with_hook
             at C:\projects\rust\src\libstd\panicking.rs:611
   4: std::panicking::begin_panic_new<alloc::string::String>
             at C:\projects\rust\src\libstd\panicking.rs:553
   5: std::panicking::begin_panic_fmt
             at C:\projects\rust\src\libstd\panicking.rs:521
   6: std::panicking::rust_begin_panic
             at C:\projects\rust\src\libstd\panicking.rs:497
   7: core::panicking::panic_fmt
             at C:\projects\rust\src\libcore\panicking.rs:92
   8: core::option::expect_failed
             at C:\projects\rust\src\libcore\option.rs:819
   9: core::option::Option<walkdir::Ancestor>::expect<walkdir::Ancestor>
             at C:\projects\rust\src\libcore\option.rs:302
  10: walkdir::IntoIter::pop
             at C:\Users\chrmarti\.cargo\registry\src\github.com-1ecc6299db9ec823\walkdir-2.0.1\src\lib.rs:867
  11: walkdir::{{impl}}::next
             at C:\Users\chrmarti\.cargo\registry\src\github.com-1ecc6299db9ec823\walkdir-2.0.1\src\lib.rs:663
  12: ignore::walk::{{impl}}::next::{{closure}}
             at C:\devel\ripgrep\ignore\src\walk.rs:820
  13: core::option::Option<core::result::Result<walkdir::DirEntry, walkdir::Error>>::or_else<core::result::Result<walkdir::DirEntry, walkdir::Error>,closure>
             at C:\projects\rust\src\libcore\option.rs:658
  14: ignore::walk::{{impl}}::next::{{closure}}
             at C:\devel\ripgrep\ignore\src\walk.rs:736
  15: core::option::Option<&mut ignore::walk::WalkEventIter>::and_then<&mut ignore::walk::WalkEventIter,core::result::Result<ignore::walk::WalkEvent, walkdir::Error>,closure>
             at C:\projects\rust\src\libcore\option.rs:605
  16: rg::run_files_one_thread
             at C:\devel\ripgrep\src\main.rs:226
  17: rg::run
             at C:\devel\ripgrep\src\main.rs:75
  18: core::ops::function::FnOnce::call_once<fn(alloc::arc::Arc<rg::args::Args>) -> core::result::Result<u64, alloc::boxed::Box<Error>>,(alloc::arc::Arc<rg::args::Args>)>
             at C:\projects\rust\src\libcore\ops\function.rs:143
  19: core::result::Result<alloc::arc::Arc<rg::args::Args>, alloc::boxed::Box<Error>>::and_then<alloc::arc::Arc<rg::args::Args>,alloc::boxed::Box<Error>,u64,fn(alloc::arc::Arc<rg::args::Args>) -> core::result::Result<u64, alloc::boxed::Box<Error>>>
             at C:\projects\rust\src\libcore\result.rs:602
  20: rg::main
             at C:\devel\ripgrep\src\main.rs:58
  21: panic_unwind::__rust_maybe_catch_panic
             at C:\projects\rust\src\libpanic_unwind\lib.rs:98
  22: std::rt::lang_start
             at C:\projects\rust\src\libstd\rt.rs:52
  23: main
  24: __scrt_common_main_seh
             at f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253
  25: BaseThreadInitThunk

 Lines Words Characters Property
 ----- ----- ---------- --------
164666

chrmarti · 2017-10-30T17:54:31Z

Btw., would it be an option to say that duplicate/missing results are unlikely and not fatal when you don't keep the files open while comparing ids? Maybe that could be enabled by a command line flag?

BurntSushi · 2017-10-30T17:56:45Z

@chrmarti That thought has crossed my mind, and it is tempting to take. It's a tough balance to strike, and I'd rather not make something like this configurable.

BurntSushi · 2018-02-01T02:07:17Z

@chrmarti It looks like your comment containing the panic isn't actually related to symlinks, but rather, the length of the path name. As an example, I tried running this in cmd:

$ dir C:\Users\andrew\work\bugs\ripgrep-633\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\reflect

This told me the directory couldn't be found. I then removed the \reflect suffix and ran:

$ dir C:\Users\andrew\work\bugs\ripgrep-633\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn

This showed me the contents of the directory, which included a reflect sub-directory. I then tried cd'ing into fn and running dir reflect, but I got the same error. I then tried this:

$ dir \\?\C:\Users\andrew\work\bugs\ripgrep-633\node_modules\babel-helper-define-map\node_modules\babel-helper-function-name\node_modules\babel-template\node_modules\babel-traverse\node_modules\babel-messages\node_modules\babel-runtime\node_modules\core-js\library\fn\reflect

My understanding is that the \\?\ prefix permits the use of long path names, and this did indeed work. To me this implies that one of the bugs here is #364. With that said, ripgrep absolutely should not panic, so that's another bug.

I did a little more investigation in the node_modules directory, and it's... kind of ridiculous. ripgrep doesn't just slow down on Windows, it also slows on Linux, and for good reason. Running against cnpm:

$ time rg --files -L -uu -j1 ./ | wc -l
2175424

real    0m4.899s
user    0m1.938s
sys     0m5.431s

And now running against npm (specifically, the result of npm i babel-preset-es2015):

$ time rg --files -L -uu -j1 ./ | wc -l
3128

real    0m0.019s
user    0m0.004s
sys     0m0.023s

So cnpm is adding over 2.1 million additional file entries. (On Windows, I get only ~1.3 million entries, likely because ripgrep doesn't descend beyond the maximum file path length.) That's a three order of magnitude increase. To ground you a bit, the entire Linux kernel source tree is a mere 56,919 files while the entire chromium repository is 234,000 files. Something seems very wrong in cnpm land, and I'm not sure ripgrep is going to be able to do much to improve the situation. (If anything, the long path name bug is probably helping matters here by preventing ripgrep from exploring the entire tree.) Note that you might want to make sure your stderr handling is decent on your end, since ripgrep is emitting lots of errors.

I compared ripgrep's time to traverse the directory with parallelism enabled (because the single threaded walker has the bug you found) with the time it takes for find to run under cygwin on my Windows laptop. (Windows 10, 128GB SSD.) I redirected both programs to another file. find found more entries (seemingly because it handles long path names? but it reported a similar number as find does on my Linux machine). ripgrep took about 20 seconds while find took almost 9 minutes. This isn't a surprise because the total output of find is almost a gigabyte while ripgrep's is about 25% of that. So you have hundreds of megabytes flowing through your pipe any time you try to crawl a cnpm produced node_modules directory with -L/--follow enabled. Just from eyeballing, it kind of seems like ripgrep's performance with respect to symlinks specifically is actually pretty decent. But there is just a metric shit ton of them.

So here's my summary:

ripgrep can't handle long path names. That's a problem and I don't really know how to fix it. See long file name support by default? #364.
There is a bug in single threaded directory traversal on Windows. I haven't dug into this yet, but it should definitely get fixed.
If ripgrep is going to print hundreds of MB worth of file paths, you're probably in for a bad time.
I am by no means a Windows expert, so my intuition here could be way off. Please be on the look out for very basic mistakes on my part, and be skeptical of anything I say w.r.t. to Windows. :-)

BurntSushi · 2018-02-01T02:07:28Z

cc @roblourens

BurntSushi · 2018-02-01T02:10:35Z

w.r.t. long file names, @retep998 claims that I can enable a switch in a manifest to turn on long file name support in Windows 10, but I don't understand how to do that. That would certainly be easier than doing path conversion manually.

chrmarti · 2018-02-01T07:27:10Z

@BurntSushi Makes sense, thanks for the detailed explanation!

roblourens · 2018-02-01T16:08:41Z

So cnpm is adding over 2.1 million additional file entries.

Wow. I'm guessing this is because of duplicates in the node_modules tree. Common dependencies are probably symlinked in multiple times, everywhere they are required.

This fixes a bug in walkdir that happened on Windows when following symlinks. It was triggered when opening a handle to the symlink failed. In particular, this resulted in the two stacks in the walkdir iterator getting out of sync. At some point, this tripped a panic when popping from one stack would be fine but popping from the other failed because it was empty. We fix this by only pushing to both stacks if and only if both pushes would succeed. This bug was found via ripgrep. See: BurntSushi/ripgrep#633 (comment)

BurntSushi · 2018-02-01T22:12:36Z

@chrmarti All right, I've fixed walkdir and updated ripgrep to bring in the new walkdir release which should fix the panic you reported above.

I'm going to mark this issue as fixed since I think all issues that I'm aware of are either fixed (the panic is fixed and I think symlink performance is reasonable) or tracked elsewhere (e.g., #364 for long path names).

I'd be happy to revisit the symlink performance issue if there's compelling evidence that we can do measurably better. A good example there might be showing the execution of another tool that can 1) successfully traverse cnpm's node_modules directory contents and 2) print every file path that it visits and 3) doing it in a way that is faster than ripgrep today.

warpdesign · 2018-02-13T10:22:57Z

I am using VS Code and the last version has a performance bug which seems to be related to this one.

I downgraded to RG version 0.6.0 and the performance problem is gone.

BurntSushi · 2018-02-13T12:07:53Z

@warpdesign Thanks for the tip! Unfortunately, it doesn't really help. We need more details. Could you please open a new bug and fill in the issue template? Please remember that this is the ripgrep repo, and bugs should be reported against ripgrep using rg commands.

chrmarti mentioned this issue Oct 11, 2017

When using cnpm/pnpm, rg uses lots of CPU microsoft/vscode#35659

Closed

BurntSushi mentioned this issue Oct 21, 2017

towards 2.0 BurntSushi/walkdir#85

Merged

BurntSushi closed this as completed in 1bf9d29 Oct 22, 2017

BurntSushi reopened this Oct 22, 2017

BurntSushi mentioned this issue Oct 22, 2017

too many open file descriptors in ripgrep 0.7.0 #648

Closed

bcherny mentioned this issue Dec 20, 2017

RipGrep using a lot of CPU microsoft/vscode#40480

Closed

BurntSushi closed this as completed in 11ad7ab Feb 1, 2018

BurntSushi added the bug A bug. label Feb 20, 2018

steadylearner mentioned this issue Jan 27, 2020

"too many open files" when serving static files for the concurrent requests seanmonstar/warp#420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File list performance with symlinks on Windows #633

File list performance with symlinks on Windows #633

chrmarti commented Oct 10, 2017

chrmarti commented Oct 10, 2017

BurntSushi commented Oct 10, 2017

chrmarti commented Oct 10, 2017

BurntSushi commented Oct 21, 2017

BurntSushi commented Oct 22, 2017

BurntSushi commented Oct 22, 2017 •

edited

Loading

chrmarti commented Oct 24, 2017

BurntSushi commented Oct 24, 2017

chrmarti commented Oct 24, 2017

chrmarti commented Oct 30, 2017

BurntSushi commented Oct 30, 2017

BurntSushi commented Feb 1, 2018

BurntSushi commented Feb 1, 2018

BurntSushi commented Feb 1, 2018

chrmarti commented Feb 1, 2018

roblourens commented Feb 1, 2018 •

edited

Loading

BurntSushi commented Feb 1, 2018 •

edited

Loading

warpdesign commented Feb 13, 2018

BurntSushi commented Feb 13, 2018 •

edited

Loading

File list performance with symlinks on Windows #633

File list performance with symlinks on Windows #633

Comments

chrmarti commented Oct 10, 2017

chrmarti commented Oct 10, 2017

BurntSushi commented Oct 10, 2017

chrmarti commented Oct 10, 2017

BurntSushi commented Oct 21, 2017

BurntSushi commented Oct 22, 2017

BurntSushi commented Oct 22, 2017 • edited Loading

chrmarti commented Oct 24, 2017

BurntSushi commented Oct 24, 2017

chrmarti commented Oct 24, 2017

chrmarti commented Oct 30, 2017

BurntSushi commented Oct 30, 2017

BurntSushi commented Feb 1, 2018

BurntSushi commented Feb 1, 2018

BurntSushi commented Feb 1, 2018

chrmarti commented Feb 1, 2018

roblourens commented Feb 1, 2018 • edited Loading

BurntSushi commented Feb 1, 2018 • edited Loading

warpdesign commented Feb 13, 2018

BurntSushi commented Feb 13, 2018 • edited Loading

BurntSushi commented Oct 22, 2017 •

edited

Loading

roblourens commented Feb 1, 2018 •

edited

Loading

BurntSushi commented Feb 1, 2018 •

edited

Loading

BurntSushi commented Feb 13, 2018 •

edited

Loading