Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace walkdir with ignore for better path walking defaults #48

Closed

Conversation

mickvangelderen
Copy link
Contributor

The ignore crate (also by Burntsushi) has good defaults for walking the file system, like for example not recursing into ignored directories.

On the implementation side, instead of building a parallel iterator through rayon, we could also use the parallel walker exposed by ignore. We will have to implement a visitor type to collect the analysis results though. I opted not to do this in the initial PR to limit the amount of changes.

Instead of building a parallel iterator through rayon, we
could also use the parallel walker exposed by ignore. We will have to implement a
visitor type to collect the analysis results though. I opted not to do this in the
initial PR to limit the amount of changes.
Copy link
Owner

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great to me! And yes, would be nice to use the included parallel facilities if we can in the future.

src/main.rs Outdated Show resolved Hide resolved
Co-authored-by: Benjamin Bouvier <public@benj.me>
@mickvangelderen
Copy link
Contributor Author

How do we guarantee the code works? I have not manually tested this change.

@bnjbvr
Copy link
Owner

bnjbvr commented Nov 2, 2022

If it passes CI it's a good sign! The last run failed because the code isn't formatted anymore; running cargo fmt should fix it.

@bnjbvr
Copy link
Owner

bnjbvr commented Nov 2, 2022

Oh and I forgot to say, you can run tests locally with cargo test as well.

@bnjbvr
Copy link
Owner

bnjbvr commented Nov 2, 2022

uh it's not passing Clippy checks now, sorry 😅

clippy --all --all-features -- -D warnings

@mickvangelderen
Copy link
Contributor Author

@bnjbvr the tests should pass now but there's something to consider in the implementation.

To make the target_dir_is_not_skipped_when_skip_target_dir_is_false test pass we need to both set skip_target_dir to false and disable the usage of git ignore files in the WalkBuilder. That defeats the purpose to introduce the ignore crate.

walk_builder.git_ignore(false);

if skip_target_dir {
walk_builder.filter_entry(|entry| !is_target_dir(entry));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the walk_builder only supports a single filter entry function.

Comment on lines +111 to +116
// NOTE(mickvangelderen): This makes the skip_target_dir = false test pass, but I am not sure
// this is how we want to implement this. Perhaps we should add an argument to toggle usage of
// ignore files and modify the tests.
walk_builder.git_exclude(false);
walk_builder.git_global(false);
walk_builder.git_ignore(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the skip_target_dir = false test pass, but I am not sure
this is how we want to implement this. Perhaps we should add an argument to toggle usage of
ignore files and modify the tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's unfortunate, so I see a few options:

  1. just drop the test, and make it so that the default behavior is to ignore everything it's in gitignores et al. I think it's a bit too restrictive for users, though, so I wouldn't prefer this one
  2. either, in some cases it's justified that it's useful to look at ignored files, so we make it an option (set by a cli arg) to keep the previous behavior. Then we select either the previous iterator, or the new build_toml_file_iterator, and by default we ignore (so scanning ignored paths is opt-in, not opt-out).
  3. alternatively, we do not use this iterator, and we implement some way to ignore specific directories (would help also for How to exclude crates not part of the root crate from analysis #49). A bit more cumbersome than just ignoring what's in gitignore. Also, orthogonal to the two previous options, so it could be this and (1 or 2).

So maybe adding the option to select one iterator or the other might be the nicest here, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have a look at cargo udeps to see what excluding/filtering/ignoring options they provide, we may be able to draw some inspiration from a more mature project.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey hey! sorry for the need to rebase, but now with argument parsing with argh it might be simpler to add new options, at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have a look at cargo udeps to see what excluding/filtering/ignoring options they provide

They just have a --exclude <pkgid> flag, which is passed to cargo check.

Take workspace-hack as an example, it won't check workspace-hack's dependencies, but other crate's dependency to workspace-hack is still viewed as unused. 🤔

Comment on lines +149 to +154
// NOTE(mickvangelderen): Instead of building a parallel iterator through rayon, we
// could also use the parallel walker exposed by ignore. We will have to implement a
// visitor type to collect the analysis results though. I opted not to do this in the
// initial PR to limit the amount of changes.
.par_bridge()
.filter_map(|manifest_path| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not walking the file system in parallel while we could. We only process the files in parallel. Not sure if it matters.

Copy link
Owner

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! sorry to not have gone back to this earlier, I wasn't sure if this was ready or not based on this comment:

I'd like to have a look at cargo udeps to see what excluding/filtering/ignoring options they provide, we may be able to draw some inspiration from a more mature project.

In any case, this likely needs a rebase. If you have the opportunity to do it, I can take a further look :)

@IgnisDa
Copy link

IgnisDa commented Oct 29, 2023

Hi @mickvangelderen @bnjbvr! Any updates on this PR?

@bnjbvr
Copy link
Owner

bnjbvr commented Oct 29, 2023

No updates on my side, I was waiting for the PR's author. If we don't hear from them, say, in the next two weeks, I'll close this, unless someone wants to take the PR over (I'm still open to the idea of having that feature,though).

@mickvangelderen
Copy link
Contributor Author

The proposed change is pretty fun to try and implement so I might give it a try just because it seems there is interest in it now. My own motivation waned because my team did not end up using cargo-machete and I simply prioritized other work.

@mickvangelderen
Copy link
Contributor Author

I reimplemented this change in #95 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants