Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ripgrep: can't search for a query containing a space, using the 'whole word' flag #23534

Closed
aeschli opened this issue Mar 29, 2017 · 9 comments
Closed
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug search Search widget and operation issues verified Verification succeeded
Milestone

Comments

@aeschli
Copy link
Contributor

aeschli commented Mar 29, 2017

Testing #23319

Version 1.11.0-insider
Commit 646d0f9
Date 2017-03-28T12:36:25.374Z

Linux

In the vscode repo, search with whole word enabled

  • 'Remove ' without ripgrep: 357 matches, with ripgrep 0 matches
  • 'Remove on' without ripgrep: 1 match, with ripgrep 0 matches
  • '${1:': with ripgrep: 21 matches, without ripgrep: 235 matches

I agree it's discussable what it 'whole word' means for a search term that is more than a word. But just wanted to bring it to your attention.

@aeschli aeschli changed the title Reipgrep: 'whole word' matches behave differently with whitespaces Ripgrep: 'whole word' matches behave differently with whitespaces Mar 29, 2017
@aeschli aeschli changed the title Ripgrep: 'whole word' matches behave differently with whitespaces Ripgrep: 'whole word' matches behave differently non-word characters Mar 29, 2017
@roblourens
Copy link
Member

If you enable the regex switch, then these work - the problem is actually that strings.escapeRegExpCharacters escapes the space, and ripgrep doesn't recognize that as a valid escape sequence. I don't think it should escape a space, but I won't change it for now.

@roblourens roblourens changed the title Ripgrep: 'whole word' matches behave differently non-word characters Ripgrep: can't search for a query containing a space, using the 'whole word' flag Mar 29, 2017
@roblourens roblourens added this to the April 2017 milestone Mar 29, 2017
@roblourens roblourens added the search Search widget and operation issues label Mar 29, 2017
@roblourens
Copy link
Member

roblourens commented Mar 29, 2017

Possibly March.

@bpasero Do you know why this function escapes whitespace (\s), and is it safe to have it not do that?

src/vs/base/common/strings.ts

export function escapeRegExpCharacters(value: string): string {
	return value.replace(/[\-\\\{\}\*\+\?\|\^\$\.\,\[\]\(\)\#\s]/g, '\\$&');
}

Maybe for march I can clone it without \s just for ripgrep, and remove it for everything else in April.

@BurntSushi
Copy link

@roblourens For ripgrep, I don't think you want to escape commas either. The full list of allowable escape characters isn't particularly well documented, but you can see them here: https://github.com/rust-lang/regex/blob/master/regex-syntax/src/parser.rs#L1225

@BurntSushi
Copy link

BurntSushi commented Mar 29, 2017

@roblourens Popping up a level, if you want to disable regex entirely, then you can pass the -F/--fixed-strings switch to ripgrep, which will always interpret the pattern as a literal. That way, you shouldn't have to fuss with escaping yourself.

@roblourens
Copy link
Member

roblourens commented Mar 29, 2017

Comma also triggers this issue. And that can be a special char in a regex, but only sometimes. JS takes it escaped or unescaped when you want to match a comma, so it seems like that could be removed too.

I use --fixed-strings when not searching for a regex, unless the 'match whole word' option is set. This is because the --word-regexp option only works on a regex, right? This issue only affects a query that has the 'match whole word' option enabled, and the regex option disabled.

@BurntSushi
Copy link

@roblourens Nope! -w works with -F as expected. I admit the name --word-regexp is perhaps bad, but I claim innocence: I snagged it from GNU grep.

@roblourens
Copy link
Member

Hm, I could have sworn there was a case where it didn't work right, because I went out of my way to do it that way. But now it seems to work fine like you say, so maybe I just don't need to escape those chars.

@roblourens roblourens modified the milestones: March 2017, April 2017 Mar 29, 2017
@roblourens roblourens added the bug Issue identified by VS Code Team member as probable bug label Mar 29, 2017
@roblourens
Copy link
Member

The number of matches using 'match whole word' is still less than before, because we are tricky about adding the \b flags when the query starts with something like a special character: https://github.com/Microsoft/vscode/blob/master/src/vs/base/common/strings.ts#L211

We can get the same effect by not using the --word-regexp flag and adding them ourselves, of course then we have to escape the regex chars correctly. I'll look at that next month.

@aeschli
Copy link
Contributor Author

aeschli commented Mar 31, 2017

Verified that searches or 'Remove ' and 'Remove on' work well, but '${1:' somehow got worse. I annotated #23624

@aeschli aeschli added the verified Verification succeeded label Mar 31, 2017
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug search Search widget and operation issues verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

3 participants