-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace walkdir with ignore for better path walking defaults #48
Replace walkdir with ignore for better path walking defaults #48
Conversation
Instead of building a parallel iterator through rayon, we could also use the parallel walker exposed by ignore. We will have to implement a visitor type to collect the analysis results though. I opted not to do this in the initial PR to limit the amount of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great to me! And yes, would be nice to use the included parallel facilities if we can in the future.
Co-authored-by: Benjamin Bouvier <public@benj.me>
How do we guarantee the code works? I have not manually tested this change. |
If it passes CI it's a good sign! The last run failed because the code isn't formatted anymore; running |
Oh and I forgot to say, you can run tests locally with |
uh it's not passing Clippy checks now, sorry 😅
|
@bnjbvr the tests should pass now but there's something to consider in the implementation. To make the |
walk_builder.git_ignore(false); | ||
|
||
if skip_target_dir { | ||
walk_builder.filter_entry(|entry| !is_target_dir(entry)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: the walk_builder only supports a single filter entry function.
// NOTE(mickvangelderen): This makes the skip_target_dir = false test pass, but I am not sure | ||
// this is how we want to implement this. Perhaps we should add an argument to toggle usage of | ||
// ignore files and modify the tests. | ||
walk_builder.git_exclude(false); | ||
walk_builder.git_global(false); | ||
walk_builder.git_ignore(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the skip_target_dir = false test pass, but I am not sure
this is how we want to implement this. Perhaps we should add an argument to toggle usage of
ignore files and modify the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's unfortunate, so I see a few options:
- just drop the test, and make it so that the default behavior is to ignore everything it's in gitignores et al. I think it's a bit too restrictive for users, though, so I wouldn't prefer this one
- either, in some cases it's justified that it's useful to look at ignored files, so we make it an option (set by a cli arg) to keep the previous behavior. Then we select either the previous iterator, or the new
build_toml_file_iterator
, and by default we ignore (so scanning ignored paths is opt-in, not opt-out). - alternatively, we do not use this iterator, and we implement some way to ignore specific directories (would help also for How to exclude crates not part of the root crate from analysis #49). A bit more cumbersome than just ignoring what's in gitignore. Also, orthogonal to the two previous options, so it could be this and (1 or 2).
So maybe adding the option to select one iterator or the other might be the nicest here, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have a look at cargo udeps
to see what excluding/filtering/ignoring options they provide, we may be able to draw some inspiration from a more mature project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey hey! sorry for the need to rebase, but now with argument parsing with argh
it might be simpler to add new options, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have a look at cargo udeps to see what excluding/filtering/ignoring options they provide
They just have a --exclude <pkgid>
flag, which is passed to cargo check
.
Take workspace-hack
as an example, it won't check workspace-hack
's dependencies, but other crate's dependency to workspace-hack
is still viewed as unused. 🤔
// NOTE(mickvangelderen): Instead of building a parallel iterator through rayon, we | ||
// could also use the parallel walker exposed by ignore. We will have to implement a | ||
// visitor type to collect the analysis results though. I opted not to do this in the | ||
// initial PR to limit the amount of changes. | ||
.par_bridge() | ||
.filter_map(|manifest_path| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not walking the file system in parallel while we could. We only process the files in parallel. Not sure if it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! sorry to not have gone back to this earlier, I wasn't sure if this was ready or not based on this comment:
I'd like to have a look at cargo udeps to see what excluding/filtering/ignoring options they provide, we may be able to draw some inspiration from a more mature project.
In any case, this likely needs a rebase. If you have the opportunity to do it, I can take a further look :)
Hi @mickvangelderen @bnjbvr! Any updates on this PR? |
No updates on my side, I was waiting for the PR's author. If we don't hear from them, say, in the next two weeks, I'll close this, unless someone wants to take the PR over (I'm still open to the idea of having that feature,though). |
The proposed change is pretty fun to try and implement so I might give it a try just because it seems there is interest in it now. My own motivation waned because my team did not end up using |
I reimplemented this change in #95 . |
The
ignore
crate (also by Burntsushi) has good defaults for walking the file system, like for example not recursing into ignored directories.On the implementation side, instead of building a parallel iterator through rayon, we could also use the parallel walker exposed by ignore. We will have to implement a visitor type to collect the analysis results though. I opted not to do this in the initial PR to limit the amount of changes.