Unroll `collect_call_path` to speed up common cases #5792

charliermarsh · 2023-07-16T00:49:23Z

Summary

This PR just naively unrolls collect_call_path to handle attribute resolutions of up to eight segments. In profiling via Instruments, it seems to be about 4x faster for a very hot code path (4% of total execution time on main, 1% here).

Profiling by running RAYON_NUM_THREADS=1 cargo instruments -t time --profile release-debug --time-limit 10000 -p ruff_cli -o FromSlice.trace -- check crates/ruff/resources/test/cpython --silent -e --no-cache --select ALL, and modifying the linter to loop infinitely up to the specified time (10 seconds) to increase sample size.

Before:

After:

charliermarsh · 2023-07-16T00:49:37Z

crates/ruff/src/checkers/physical_lines.rs

@@ -115,7 +115,6 @@ pub(crate) fn check_physical_lines(
                            diagnostics.push(diagnostic);
                        }
                    }
-                } else {


Empty else, not related.

github-actions · 2023-07-16T01:00:52Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      9.8±0.02ms     4.2 MB/sec    1.00      9.8±0.03ms     4.2 MB/sec
formatter/numpy/ctypeslib.py               1.00   1886.4±1.80µs     8.8 MB/sec    1.00  1888.0±12.67µs     8.8 MB/sec
formatter/numpy/globals.py                 1.00    205.5±1.79µs    14.4 MB/sec    1.00    205.3±0.37µs    14.4 MB/sec
formatter/pydantic/types.py                1.01      4.2±0.01ms     6.0 MB/sec    1.00      4.2±0.02ms     6.1 MB/sec
linter/all-rules/large/dataset.py          1.03     13.8±0.07ms     3.0 MB/sec    1.00     13.3±0.11ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.02      3.5±0.03ms     4.8 MB/sec    1.00      3.4±0.02ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.01    373.7±1.20µs     7.9 MB/sec    1.00    369.4±2.89µs     8.0 MB/sec
linter/all-rules/pydantic/types.py         1.02      6.1±0.01ms     4.1 MB/sec    1.00      6.0±0.01ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.00      7.0±0.01ms     5.8 MB/sec    1.00      6.9±0.01ms     5.9 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1413.3±2.96µs    11.8 MB/sec    1.00   1415.5±1.69µs    11.8 MB/sec
linter/default-rules/numpy/globals.py      1.00    147.4±0.29µs    20.0 MB/sec    1.00    147.6±0.67µs    20.0 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.1±0.01ms     8.3 MB/sec    1.00      3.1±0.01ms     8.2 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     14.4±0.61ms     2.8 MB/sec    1.27     18.3±0.58ms     2.2 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.9±0.17ms     5.8 MB/sec    1.17      3.3±0.13ms     5.0 MB/sec
formatter/numpy/globals.py                 1.00   311.8±14.48µs     9.5 MB/sec    1.14   354.2±17.15µs     8.3 MB/sec
formatter/pydantic/types.py                1.00      6.5±0.44ms     3.9 MB/sec    1.13      7.3±0.26ms     3.5 MB/sec
linter/all-rules/large/dataset.py          1.00     20.3±0.72ms     2.0 MB/sec    1.00     20.4±0.62ms  2044.0 KB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      5.3±0.21ms     3.2 MB/sec    1.02      5.3±0.24ms     3.1 MB/sec
linter/all-rules/numpy/globals.py          1.00   625.4±27.38µs     4.7 MB/sec    1.02   640.1±32.57µs     4.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      9.0±0.35ms     2.8 MB/sec    1.03      9.3±0.47ms     2.7 MB/sec
linter/default-rules/large/dataset.py      1.00     10.4±0.44ms     3.9 MB/sec    1.07     11.1±0.33ms     3.7 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00      2.2±0.11ms     7.7 MB/sec    1.02      2.2±0.08ms     7.5 MB/sec
linter/default-rules/numpy/globals.py      1.00   263.3±11.99µs    11.2 MB/sec    1.01   266.7±15.57µs    11.1 MB/sec
linter/default-rules/pydantic/types.py     1.00      4.7±0.14ms     5.5 MB/sec    1.00      4.7±0.21ms     5.4 MB/sec

sbrugman · 2023-07-16T19:19:24Z

and modifying the linter to loop infinitely

@charliermarsh Could you please share the exact diff? Was inspired by this PR to experiment with some of the benchmarking in #5811, but couldn't reproduce this setup (it ended up giving irrelevant profiles) so ended up with running over CPython

konstin · 2023-07-18T13:58:14Z

@sbrugman You can get the same result with cargo run --bin ruff_dev -- repeat --no-cache --repeat 10000 crates/ruff/resources/test/cpython (it's ruff check with an additional --repeat option)

charliermarsh · 2023-07-18T14:00:59Z

Oh that seems a lot better than what I did. I will post the diff regardless in a bit.

konstin · 2023-07-18T14:53:54Z

The return type of collect_call_path also seems like an opportunity for optimization: We generally either want to check against a specific length (e.g. 2) and don't need to save anything otherwise or iterate over the results (e.g. the widely used resolve_call_path), where we only want to iterate over elements from left to write (we could do this with an Iterator that is implemented recursively, but i'm not sure if the extra function calls aren't making this slower)

konstin

That's a great speedup

charliermarsh · 2023-07-18T15:29:52Z

The return type of collect_call_path also seems like an opportunity for optimization...

Agreed.

## Summary This PR just naively unrolls `collect_call_path` to handle attribute resolutions of up to eight segments. In profiling via Instruments, it seems to be about 4x faster for a very hot code path (4% of total execution time on `main`, 1% here). Profiling by running `RAYON_NUM_THREADS=1 cargo instruments -t time --profile release-debug --time-limit 10000 -p ruff_cli -o FromSlice.trace -- check crates/ruff/resources/test/cpython --silent -e --no-cache --select ALL`, and modifying the linter to loop infinitely up to the specified time (10 seconds) to increase sample size. Before: <img width="1792" alt="Screen Shot 2023-07-15 at 5 13 34 PM" src="https://github.com/astral-sh/ruff/assets/1309177/4a8b0b45-8b67-43e9-af5e-65b326928a8e"> After: <img width="1792" alt="Screen Shot 2023-07-15 at 8 38 51 PM" src="https://github.com/astral-sh/ruff/assets/1309177/d8829159-2c79-4a49-ab3c-9e4e86f5b2b1">

charliermarsh requested review from MichaReiser and konstin July 16, 2023 00:49

charliermarsh commented Jul 16, 2023

View reviewed changes

Unroll collect_call_path

7504c09

charliermarsh force-pushed the charlie/unroll branch from 6fb6ff4 to 7504c09 Compare July 17, 2023 02:32

konstin approved these changes Jul 18, 2023

View reviewed changes

charliermarsh merged commit c868def into main Jul 18, 2023

charliermarsh deleted the charlie/unroll branch July 18, 2023 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unroll `collect_call_path` to speed up common cases #5792

Unroll `collect_call_path` to speed up common cases #5792

charliermarsh commented Jul 16, 2023

charliermarsh Jul 16, 2023

github-actions bot commented Jul 16, 2023 •

edited

Loading

sbrugman commented Jul 16, 2023 •

edited

Loading

konstin commented Jul 18, 2023

charliermarsh commented Jul 18, 2023

konstin commented Jul 18, 2023

konstin left a comment

charliermarsh commented Jul 18, 2023

Unroll collect_call_path to speed up common cases #5792

Unroll collect_call_path to speed up common cases #5792

Conversation

charliermarsh commented Jul 16, 2023

Summary

charliermarsh Jul 16, 2023

Choose a reason for hiding this comment

github-actions bot commented Jul 16, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

sbrugman commented Jul 16, 2023 • edited Loading

konstin commented Jul 18, 2023

charliermarsh commented Jul 18, 2023

konstin commented Jul 18, 2023

konstin left a comment

Choose a reason for hiding this comment

charliermarsh commented Jul 18, 2023

Unroll `collect_call_path` to speed up common cases #5792

Unroll `collect_call_path` to speed up common cases #5792

github-actions bot commented Jul 16, 2023 •

edited

Loading

sbrugman commented Jul 16, 2023 •

edited

Loading