Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rg.exe takes too much cpu #43847

Closed
warpdesign opened this issue Feb 16, 2018 · 36 comments
Closed

rg.exe takes too much cpu #43847

warpdesign opened this issue Feb 16, 2018 · 36 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug search Search widget and operation issues verified Verification succeeded
Milestone

Comments

@warpdesign
Copy link
Contributor

warpdesign commented Feb 16, 2018

I post this issue as a follow up to #35659

In my workspace, the latest VSCode (1.20 or 1.21 insider) takes too much cpu. Using the task manager, I can see that VSCode spawns two rg.exe processes that use around 45% cpu each, making my computer unusable.

Note that these processes appear as soon as I press ctrl + p.

I have a few symlinks in my project, maybe 4-6, don't know exactly.

As I said in #35659, adding search.followSymlinks: false doesn't fix the problem.

Using the TaskManager, I could see the full command line that's being used for rg.exe:

"c:\Program Files\Microsoft VS Code Insiders\resources\app\node_modules\vscode-ripgrep\bin\rg.exe" --files --hidden --case-sensitive -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !**/bower_components -g !**/node_modules -g !**/onmap-shared --no-ignore-parent -- .

I tried to run the exact same command from a command prompt (after cd to my project's rootdir) and it worked, took about 1 minute with no error.

During the execution of the command, the cpu usage is very high and never goes below 85%.

I am Using Windows 10 (Inisders) 64bit. A colleague of mine got the same cpu usage problem with stable Windows 10 64 bit (Fall Creators Update), so we can rule out Windows Insider version.

Let me know if you need more information.

@jens1o
Copy link
Contributor

jens1o commented Feb 16, 2018

And somehow it starts to index at the launch of vscode which is not necessary in my point of view as I use it very rarely...

@chrmarti
Copy link
Collaborator

@warpdesign How many files do you have? (In Powershell you could pipe the output of the above rg to | Measure-Object –Line to count.)

Is this on a local NTFS drive?

@chrmarti
Copy link
Collaborator

@jens1o That is likely triggered by an extension with a workspaceContains activation trigger. QuickOpen does not index after startup.

@jens1o
Copy link
Contributor

jens1o commented Feb 16, 2018

@jens1o That is likely triggered by an extension with a workspaceContains activation trigger. QuickOpen does not index after startup.

Okay, that sounds reasonable then. I didn't knew you are using rg for file-extension lookups, too.

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

@chrmarti It is a local NTFS drive.

I tried to run the exact same rg.exe command in a powershell prompt but I am getting an error from powershell: is there something special to do to execute normal executables with lots of parameters in a powershell prompt?

"c:\Program Files\Microsoft VS Code Insiders\resources\app\node_modules\vscode-ripgrep\bin\rg.exe" --files --hidden --case-sensitive -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !**/bower_components -g !**/node_modules -g !**/onmap-shared --no-ignore-parent -- . | Measure-Object -Line
At line:1 char:102
+ ... resources\app\node_modules\vscode-ripgrep\bin\rg.exe" --files --hidde ...
+                                                             ~~~~~
Unexpected token 'files' in expression or statement.
At line:1 char:1
+ "c:\Program Files\Microsoft VS Code Insiders\resources\app\node_modul ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The '--' operator works only on variables or on properties.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : UnexpectedToken

This powershell stuff is so unfriendly and un-intuitive it makes me think vi is not so bad...

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

I finally used | find /v /c "" which should be the equivalent and gave me 108582 files.

What has the number of files to do with the cpu usage?

Shouldn't the rg.exe processes run with a lower multitask priority?

@chrmarti
Copy link
Collaborator

@warpdesign Could you check what rg.exe and the rest of your system does with Process Monitor: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

What I do with Process Monitor is:

  • Stop capturing events and clear the existing ones.
  • Prepare VS Code so I can hit Ctrl+P.
  • Start capturing events in Process Monitor.
  • Ctrl+P in VS Code.
  • After 20s or so (it will collect a lot of events) stop capturing events in Process Monitor.

Please then share what you found.

@warpdesign
Copy link
Contributor Author

There are lots of file access like these:

prcmonitor

I can send the huge log if you need it.

@chrmarti
Copy link
Collaborator

There are a lot of REPARSE results for the various ignore files, are these symlinks? Some of these later result in a NAME NOT FOUND, could these be dangling symlinks? Are there many of these?

You mentioned ripgrep 0.6.0 didn't have that performance issue, does that also have these REPARSE results for the ignore files?

/cc @BurntSushi

@BurntSushi
Copy link

The various checks on .rgignore, .ignore, .git, .git/info/exclude, etc., are all necessary to determine whether those files exist, and if so, ripgrep picks up their patterns. These checks have been there since ripgrep was first released.

I don't understand the significance of a REPARSE result though.

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

Like I said, there aren't much symlinks, but there are lots of files in these symlinks.

So in the screenshot, there is only one symlink:

editor/app/resources which points to e:\data\resources\240117\resources.

So all REPARSE are following the same symlink. Behind that symlink there are lots of files/directories but there aren't more symlinks to resolve.

But if it's the symlink access that takes cpu time and it's being resolved for each file/directory, maybe this could explain the cpu usage?

I don't understand the NAME NOT FOUND errors: the parent directory exists (but the .rgignore that rg.exe attempts to load of course doesn't exist), and again, it's all resolving the same resource symlink: there is only one symlink involved.

One more hint: could it be a path_length limitation problem?

@chrmarti
Copy link
Collaborator

@BurntSushi My understanding is that REPARSE is the result when the file has a reparse point which is a generic hook in NTFS that is also used for symlinks. (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365503%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396)

@warpdesign
Copy link
Contributor Author

Also, DIR command shows the symlink as <SYMLINKD>: I know you may also create junctions but I am not sure what the difference between the two is.

@BurntSushi
Copy link

BurntSushi commented Feb 19, 2018

At least from ripgrep's point of view, crawling a directory with 100,000 entries in it isn't going to be especially snappy, at least from my experience running similar searches on Windows systems. So there might be an expectation adjustment that is at play here. I do not have any opinion on "lower multitask priority." That is beyond my domain. :)

It would also help to know what version of ripgrep is being used here, since there have been changes in this area that could impact performance over the last couple releases. If you weren't observing this problem with ripgrep 0.6.0 but are observing it with ripgrep 0.7.1, then that is an interesting data point, and it might be worth comparing the process monitor output for each tool. But beyond that, I would personally need a reproducible case to debug it further.

Another important debug point here is the ability to run these commands in a shell. I would suggest that someone help you figure out how to do that. ripgrep may be emitting important details about this particular search that will help diagnose the problem.

One more hint: could it be a path_length limitation problem?

ripgrep can't read files that have paths over the limit, and this is a known bug. It will emit an error for each.

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

At least from ripgrep's point of view, crawling a directory with 100,000 entries in it isn't going to be especially snappy, at least from my experience running similar searches on Windows systems. So there might be an expectation adjustment that is at play here. I do not have any opinion on "lower multitask priority." That is beyond my domain. :)

Why would a large directory use more cpu? I would understand it would take more time to be indexed, but not why it uses more cpu. Does it mean the more files, the more cpu will be used?

Jetbrain's Idea doesn't has this cpu usage problem and can index everything just fine, without putting my machine on its knees :)

Another important debug point here is the ability to run these commands in a shell. I would suggest that someone help you figure out how to do that.

I can run these commands in a cmd shell. What I couldn't is run these commands in a powershell terminal: for some reason it returns an error.

All tests were done using ripgrep that's bundled with latest insider build:

ripgrep 0.7.1 (rev 85cd3f0a6e)
-SIMD -AVX

@chrmarti
Copy link
Collaborator

@warpdesign I'm trying to reproduce the REPARSE events you see, but a single symbolic link to a large directory doesn't do that for me.

To establish if this is a regression, could you run the following in PowerShell when in your workspace folder with ripgrep shipped with VS Code 1.20 and also with ripgrep 0.6.0 to compare against:

(Measure-Command { & 'C:\Program Files\Microsoft VS Code\resources\app\node_
modules\vscode-ripgrep\bin\rg.exe' --files --hidden  --case-sensitive -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !**/bower_components -g !**/node_modules -g !**/onmap-shared --no-ignore-parent -- . | Measure-Object –Line | Out-Default }).TotalSeconds

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

I tried running the command of my first message using ripgrep that's bundled with VSCode 1.18.1 (that's ripgrep 0.6.0) and it doesn't seem to follow the symlink, so my worspace is indexed in a second or two and without any cpu usage problem.

No wonder it's faster in version 0.6.0 :)

@BurntSushi
Copy link

@warpdesign It sounds like what you're saying is that ripgrep 0.7.1 is following a symbolic link even though you aren't giving it the --follow flag? If so, I can't think of why that might happen. It might be worth giving ripgrep 0.7.1 the -j1 flag and see if that changes things. Another thing to try is ripgrep 0.8.0 to see if this got fixed. (It isn't just a shot in the dark either, both ripgrep 0.7 and ripgrep 0.8 have changes around the symbolic link handling on Windows.)

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

@BurntSushi If I understand correctly, ripgrep isn't supposed to follow symlinks unless the --follow option is used?

@warpdesign
Copy link
Contributor Author

Then I think we know where the regression comes from:

  • version 0.7.1 always follows symlinks, even when the --follow option isn't specified
  • version 0.6.0 only follows symlinks if --follow is used

If I run the same command line with 0.6.0 with the --follow option, it behaves like 0.7.0: using all the cpu.

@BurntSushi
Copy link

As I said above, I don't immediately know how that's happening. I need a reproducible case. It would also be helpful to know whether this happens with ripgrep 0.8.

@chrmarti
Copy link
Collaborator

@warpdesign I don't see 0.7.1 (as shipped with VS Code 1.20) follow symbolic links nor hard links (junctions) without --follow. Could you explain a little more about your folder structure: What is the folder you open VS Code on / run rg.exe from? Are there any links in that folder's parent folders? Where in that folder are the links you mention?

@warpdesign
Copy link
Contributor Author

I created a very small test-case;

Directory of E:\Docs\Dev\ripgrep

19/02/2018  14:38    <DIR>          .
19/02/2018  14:38    <DIR>          ..
19/02/2018  14:28    <DIR>          bar
19/02/2018  14:31                 5 file.txt
19/02/2018  14:32    <DIR>          foo
19/02/2018  14:38    <SYMLINKD>     resources [..\ripgrep_resources]
               1 File(s)              5 bytes
               5 Dir(s)  55 270 326 272 bytes free

E:\Docs\Dev\ripgrep>"c:\Program Files\Microsoft VS Code Insiders\resources\app\node_modules\vscode-ripgrep\bin\rg.exe" --files --hidden --case-sensitive -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !**/bower_components -g !**/node_modules -g !**/onmap-shared --no-ignore-parent -- .
file.txt
foo\file.txt
foo\bar\file.txt
resources\file.txt
resources\foo\file.txt

As you can see, ripgrep goes through resources which is a symlink, and the --follow option hasn't been specified.

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 19, 2018

Here is the ouput of rg.exe 0.6.0 for comparison:

E:\Docs\Dev\ripgrep>"c:\Program Files (x86)\Microsoft VS Code\resources\app\node_modules\vscode-ripgrep\bin\rg.exe" --files --hidden --case-sensitive -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !**/bower_components -g !**/node_modules -g !**/onmap-shared --no-ignore-parent -- .
file.txt
foo\file.txt
foo\bar\file.txt

@warpdesign
Copy link
Contributor Author

Where can I get rg 0.8.0?

@chrmarti
Copy link
Collaborator

Thanks @warpdesign, I can reproduce it with the insiders build!

@roblourens 0.7.1 in insiders always follows symlinks while 0.7.1 in 1.20 correctly doesn't.

Using -j1 with the 0.7.1 in insiders does not follow the symlink.

@chrmarti
Copy link
Collaborator

@chrmarti
Copy link
Collaborator

chrmarti commented Feb 19, 2018

@BurntSushi Using your Windows 64bit MSVC builds 0.7.1 works correctly, 0.8.0 follows the symlink even without --follow and 0.8.0 then works correctly when adding -j1.

@chrmarti
Copy link
Collaborator

I have filed BurntSushi/ripgrep#820 to track the performance regression in ripgrep.

@roblourens Can you check why the ripgrep 0.7.1 build we use in 1.21-insiders has picked up the regression from 0.8.0?

@chrmarti
Copy link
Collaborator

Also filed a second issue for the original issue of following symlinks: BurntSushi/ripgrep#824

@1600
Copy link

1600 commented Feb 22, 2018

can confirm rg.exe causes excessive cpu usage, and of course it turned the whole editor unusable.

@warpdesign
Copy link
Contributor Author

warpdesign commented Feb 22, 2018

I tried to set the priority of the rg.exe process to below normal using the task manager: it stills uses much of the cpu but since its priority is inferior, it doesn't slow down the whole computer which remains usable as usual. Only impact is that rg takes a little bit more time to parse all directories.

Maybe that would be a good workaround? As a user I'd rather have the indexing process take more time than having my whole computer unusable.

priority

@chrmarti chrmarti added the search Search widget and operation issues label Feb 23, 2018
@chrmarti chrmarti added this to the February 2018 milestone Feb 23, 2018
@chrmarti
Copy link
Collaborator

Rolling back to what we have in 1.20.1 since @roblourens is out with the following commits:

We can adapt ripgrep 0.8.1 with the fixes after VS Code 1.21 has shipped.

Thanks @BurntSushi and @warpdesign!

@chrmarti chrmarti added the bug Issue identified by VS Code Team member as probable bug label Feb 23, 2018
@roblourens
Copy link
Member

Did you go past the fix for #43284? It's easy to release a fix with the old rg and a fix for that if needed.

@roblourens
Copy link
Member

Never mind it looks like we still have that.

@dbaeumer dbaeumer added the verified Verification succeeded label Mar 2, 2018
@dbaeumer
Copy link
Member

dbaeumer commented Mar 2, 2018

Marking as verified. I saw the CPU spike for rg.exe last week as well and don't with current Insider build.

@vscodebot vscodebot bot locked and limited conversation to collaborators Apr 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug search Search widget and operation issues verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

7 participants