-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unroll collect_call_path
to speed up common cases
#5792
Conversation
@@ -115,7 +115,6 @@ pub(crate) fn check_physical_lines( | |||
diagnostics.push(diagnostic); | |||
} | |||
} | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty else
, not related.
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
@charliermarsh Could you please share the exact diff? Was inspired by this PR to experiment with some of the benchmarking in #5811, but couldn't reproduce this setup (it ended up giving irrelevant profiles) so ended up with running over CPython |
6fb6ff4
to
7504c09
Compare
@sbrugman You can get the same result with |
Oh that seems a lot better than what I did. I will post the diff regardless in a bit. |
The return type of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great speedup
Agreed. |
## Summary This PR just naively unrolls `collect_call_path` to handle attribute resolutions of up to eight segments. In profiling via Instruments, it seems to be about 4x faster for a very hot code path (4% of total execution time on `main`, 1% here). Profiling by running `RAYON_NUM_THREADS=1 cargo instruments -t time --profile release-debug --time-limit 10000 -p ruff_cli -o FromSlice.trace -- check crates/ruff/resources/test/cpython --silent -e --no-cache --select ALL`, and modifying the linter to loop infinitely up to the specified time (10 seconds) to increase sample size. Before: <img width="1792" alt="Screen Shot 2023-07-15 at 5 13 34 PM" src="https://github.com/astral-sh/ruff/assets/1309177/4a8b0b45-8b67-43e9-af5e-65b326928a8e"> After: <img width="1792" alt="Screen Shot 2023-07-15 at 8 38 51 PM" src="https://github.com/astral-sh/ruff/assets/1309177/d8829159-2c79-4a49-ab3c-9e4e86f5b2b1">
## Summary This PR just naively unrolls `collect_call_path` to handle attribute resolutions of up to eight segments. In profiling via Instruments, it seems to be about 4x faster for a very hot code path (4% of total execution time on `main`, 1% here). Profiling by running `RAYON_NUM_THREADS=1 cargo instruments -t time --profile release-debug --time-limit 10000 -p ruff_cli -o FromSlice.trace -- check crates/ruff/resources/test/cpython --silent -e --no-cache --select ALL`, and modifying the linter to loop infinitely up to the specified time (10 seconds) to increase sample size. Before: <img width="1792" alt="Screen Shot 2023-07-15 at 5 13 34 PM" src="https://github.com/astral-sh/ruff/assets/1309177/4a8b0b45-8b67-43e9-af5e-65b326928a8e"> After: <img width="1792" alt="Screen Shot 2023-07-15 at 8 38 51 PM" src="https://github.com/astral-sh/ruff/assets/1309177/d8829159-2c79-4a49-ab3c-9e4e86f5b2b1">
Summary
This PR just naively unrolls
collect_call_path
to handle attribute resolutions of up to eight segments. In profiling via Instruments, it seems to be about 4x faster for a very hot code path (4% of total execution time onmain
, 1% here).Profiling by running
RAYON_NUM_THREADS=1 cargo instruments -t time --profile release-debug --time-limit 10000 -p ruff_cli -o FromSlice.trace -- check crates/ruff/resources/test/cpython --silent -e --no-cache --select ALL
, and modifying the linter to loop infinitely up to the specified time (10 seconds) to increase sample size.Before:
After: