-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ignore] Add extensive test for gitignore matching #551
Conversation
2bfebb1
to
5a8272a
Compare
The test data (gitignore rules and expected result) is based on the test repo at <https://github.com/behnam/gitignore-test>. This reveals a bug in gitignore matching: * Rules of form `<dir>/*` result in ignoring only first-level files, but no deep files. This is not correct, as `<dir>/*` matches the first-level directories under `<dir>`, resulting all to be ignored.
b6ec207
to
42a5dff
Compare
I expanded the tests to 64 rules and 158 assertions, out of which 30 is failing (marked with Most of these are blockers for migrating Cargo's include/exclude rules to I'm not sure if I'll find the time this week to dig deep into |
This is an interesting case. The bug doesn't manifest in ripgrep because the ignore rules are applied iteratively at each directory level. That is, if More generally, it's not actually clear what the right behavior is here. The fundamental issue is that ignoring files is based on globbing, and I think there's probably an impedance mismatch here. The implementation for resolving ignore rules fundamentally works via glob matching, but by doing this, it sacrifices correctness for some inputs because its primary use case doesn't need correctness for those inputs. A trivial resolution to this would be to match each ancestor of a path against the ignore rules, and if any one is ignored, then all children are subsequently ignored. This cannot be done by default because performance would suffer considerably. |
To drive this point home, if you use |
Thanks @BurntSushi for looking into this. Right, now that I think about the API as how the walker calls into it, it all makes sense. This problem of matching globs with a path AND its parents is actually how I got here. I initially submitted a patch to Cargo to apply the include/exclude patterns (the current glob-based rules) to both a file and its parents (rust-lang/cargo#4256), then we realized that we may be better off to not get into the details of matching in Cargo and use a library that supports matching a file path with a set of rules. (And, in this transition, make it better by also switching to gitignore rules instead of bare globs.) That said, I really like to add API to allow matching one path (usually file, but also could be a directory) with a gitignore ruleset. So that's what I did in the new commit here, which also allows all the tests to pass. I think it's a good thing to have this method here, because it's a common use case (when not using the built-in walker) and because we can use the helper methods here and the logic is pretty straightforward: start from the path and walk to the parents, return the first non- What do you think? |
By the way, the implementation is broken for UPDATE: It now works in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should add this method. I left a few comments mostly around the docs/naming. The implementation looks OK to me.
My only remaining concern is that this might produce matches when there shouldn't be a match (i.e., a false positive). I can't actually think of a specific test case for it though, which maybe means we're OK. I think the fact that we assume the given path is relative to the directory containing the corresponding .gitignore
file is what allows this to work. It's a very subtle point though.
ignore/src/gitignore.rs
Outdated
@@ -191,6 +191,40 @@ impl Gitignore { | |||
self.matched_stripped(self.strip(path.as_ref()), is_dir) | |||
} | |||
|
|||
/// Returns whether the given path (file or directory) or any of its parent | |||
/// directories matched a pattern in this gitignore matcher. | |||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the initial description, I think we should briefly explain what we mean by parent
. That is, it is a transformation on the path itself and doesn't try to walk up the file system. I'm not sure how to say that succinctly and clearly though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you mean. Added some more text here. Please see how that look?
ignore/src/gitignore.rs
Outdated
/// determined by a common suffix of the directory containing this | ||
/// gitignore) is stripped. If there is no common suffix/prefix overlap, | ||
/// then `path` is assumed to be relative to this matcher. | ||
pub fn matched_recursive<P: AsRef<Path>>(&self, path: P, is_dir: bool) -> Match<&Glob> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think of another name? I feel like matched_recursive
doesn't really signal to the user what this is actually doing. But this is a super subtle method, so I'm not sure what to call it either. Maybe matched_any_parent
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Renamed to matched_path_or_any_parents()
. What do you think?
Thanks, @BurntSushi. Addressed the comments in the last commit. Also, updated the fn implementation to be more flat and easier to read. |
And, I have added more text about the path being expected to be under the root, and put a I think So, I think it's a fair assumption for this API, and we can always extend it later, if we need to. What do you think? |
00d9ec3
to
32b1ba7
Compare
Looks like the appveyor test is failing because of different path handling on windows systems. I'm looking into that part. |
This looks great. I think it's good to go once you figure out Windows. Thank you! If you get stuck let me know and I can take a look (but it might take me a while to get to it). |
Using I also made the walking loop even simpler. Please take another look, if you can: https://github.com/BurntSushi/ripgrep/pull/551/files#diff-6d0dbe450a74521a529608d3e56606b7R223 And, we're going to need a crate release to use this in Cargo. Want me to add a commit to bump the version? |
@behnam Great, thank you! LGTM. No need to add a version. My scripts do that! :) |
|
Thanks for the quick help here, @BurntSushi! |
The test data (gitignore rules and expected result) is based on the test
repo at https://github.com/behnam/gitignore-test.
This reveals a bug in gitignore matching:
<dir>/*
result in ignoring only first-level files, butno deep files. This is not correct, as
<dir>/*
matches the first-leveldirectories under
<dir>
, resulting all to be ignored.(UPDATE: Turns out the inline comments are not supported in gitignore, so I dropped the related tests.)