Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Gazelle's filesystem traversal #1820

Closed
wants to merge 3 commits into from

Conversation

dzbarsky
Copy link
Contributor

What type of PR is this?

Uncomment one line below and remove others.

Performance

What package or component does this PR mostly affect?

All

What does this PR do? Why is it needed?

This PR improves Gazelle's filesystem walk in 2 ways:

  1. By using a parallel walk, resolving a long-standing TODO
  2. By adding an optional interface that plugins can implement to receive the directory contents, thus saving them from issuing duplicative syscalls to gather the same info

Which issues(s) does this PR fix?

Fixes #1819

Other notes for review

Copy link

google-cla bot commented May 31, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@dzbarsky dzbarsky force-pushed the fastwalk branch 2 times, most recently from 53e3725 to 1bfec6f Compare May 31, 2024 16:16
@dzbarsky
Copy link
Contributor Author

@linzhp please take a look?

@dzbarsky dzbarsky force-pushed the fastwalk branch 2 times, most recently from 782fe50 to ccd0e08 Compare May 31, 2024 16:43
@uhthomas
Copy link
Contributor

uhthomas commented Jun 13, 2024

Why not simply use filepath.WalkDir instead? Doesn't need a new dependency and achieves the same goal of being more efficient by not calling os.Lstat for each file or directory.

@DavidZbarsky-at
Copy link

Why not simply use filepath.WalkDir instead? Doesn't need a new dependency and achieves the same goal of being more efficient by not calling os.Lstat for each file or directory.

Because it doesn't parallelize the walk. This is faster

See also #1737

@uhthomas
Copy link
Contributor

Because it doesn't parallelize the walk. This is faster

I understand that, though I imagine a lot of the time savings will be from not calling os.Lstat. Do you have some figures to demonstrate the difference parallelisation makes here? Such, if WalkDir provides a significant improvement over Walk, but the difference between WalkDir and fastwalk is not as significant, then the maintenance burden of an external package may not be worth it.

@DavidZbarsky-at
Copy link

Because it doesn't parallelize the walk. This is faster

I understand that, though I imagine a lot of the time savings will be from not calling os.Lstat. Do you have some figures to demonstrate the difference parallelisation makes here? Such, if WalkDir provides a significant improvement over Walk, but the difference between WalkDir and fastwalk is not as significant, then the maintenance burden of an external package may not be worth it.

I don't have exact numbers offhand anymore; we wrote these patches months ago. FWIW WalkDir on its own is not trivial to use due to the complex iteration; that's part of the purpose of the Trie. Also in our testing the parallelization definitely made a significant different; between that and the Trie we reduced the runtime from 10s to 4.5s.

We do run gazelle as part of pre-commit hook so we are fairly perf-sensitive on this. I recognize that perhaps performance is not a primary motivation for the project; feel free to close out the PR if you aren't interested.

(BTW are you sure the os.Lstat is still an issue? I believe os.Readdir is already avoiding it since Go 1.16, perhaps you were thinking of ioutil.Readdir? I did measure some speedups when I contributed https://github.com/bazelbuild/bazel-gazelle/pull/1704/files)

@linzhp
Copy link
Contributor

linzhp commented Jun 19, 2024

I agreed with @uhthomas that we should avoid another external library before comparing it with filepath.WalkDir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Gazelle's filesystem walk
4 participants