Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark .nth[_back](n) with inputs n #916

Conversation

Philippe-Cholet
Copy link
Member

I introduced specialization benchmarks in #786.
However, run Combinations::nth benchmarks in #914 did not give meaningful feedback. Sure it was globally faster but we could not say how much for each n.
So here I change the benchmarks to give informations based on the value n we feed nth with.

@Philippe-Cholet Philippe-Cholet added this to the next milestone Apr 16, 2024
Copy link

codecov bot commented Apr 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.43%. Comparing base (6814180) to head (d8f2e47).
Report is 51 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #916      +/-   ##
==========================================
+ Coverage   94.38%   94.43%   +0.05%     
==========================================
  Files          48       48              
  Lines        6665     6869     +204     
==========================================
+ Hits         6291     6487     +196     
- Misses        374      382       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Philippe-Cholet
Copy link
Member Author

@kinto-b
Each Combinations::nth(n) call produced n+1 heap allocated vector items but only 1 now. So allocations decrease by 1/(n+1)-1 (computed in % below).

                       before                           after                            changes
Same allocations for:
combinations1/nth/0    [1.2090 ms 1.2121 ms 1.2157 ms]  [1.1560 ms 1.1599 ms 1.1646 ms]  [-5.1082% -4.5026% -3.7919%]
combinations2/nth/0    [1.1405 ms 1.1447 ms 1.1496 ms]  [1.1473 ms 1.1494 ms 1.1519 ms]  [+0.5059% +0.9210% +1.3303%]
combinations3/nth/0    [1.1819 ms 1.1868 ms 1.1919 ms]  [1.1895 ms 1.1907 ms 1.1919 ms]  [-2.0142% -1.0343% -0.1986%]
combinations4/nth/0    [1.3242 ms 1.3395 ms 1.3561 ms]  [1.2893 ms 1.2914 ms 1.2945 ms]  [-5.7538% -4.5556% -3.4696%]
-50% allocations for:
combinations1/nth/1    [1.2196 ms 1.2239 ms 1.2284 ms]  [639.53 µs 643.26 µs 647.97 µs]  [-49.143% -48.354% -47.710%]
combinations2/nth/1    [1.1377 ms 1.1428 ms 1.1480 ms]  [600.34 µs 601.93 µs 603.75 µs]  [-47.541% -47.277% -47.008%]
combinations3/nth/1    [1.1770 ms 1.1817 ms 1.1876 ms]  [641.31 µs 642.99 µs 644.99 µs]  [-47.351% -46.549% -45.859%]
combinations4/nth/1    [1.2842 ms 1.2854 ms 1.2869 ms]  [720.70 µs 723.56 µs 726.84 µs]  [-43.637% -43.370% -43.065%]
-67% allocations for:
combinations1/nth/2    [1.2684 ms 1.2740 ms 1.2792 ms]  [463.04 µs 463.50 µs 463.98 µs]  [-63.556% -63.375% -63.200%]
combinations2/nth/2    [1.2179 ms 1.2229 ms 1.2279 ms]  [422.52 µs 423.28 µs 424.16 µs]  [-67.348% -66.251% -65.402%]
combinations3/nth/2    [1.2612 ms 1.2660 ms 1.2712 ms]  [450.01 µs 450.41 µs 450.84 µs]  [-64.412% -64.252% -64.098%]
combinations4/nth/2    [1.3132 ms 1.3188 ms 1.3237 ms]  [499.48 µs 500.65 µs 502.05 µs]  [-62.071% -61.902% -61.722%]
-80% allocations for:
combinations1/nth/4    [1.2246 ms 1.2296 ms 1.2363 ms]  [317.72 µs 318.69 µs 319.68 µs]  [-75.067% -74.630% -74.296%]
combinations2/nth/4    [1.1881 ms 1.1911 ms 1.1947 ms]  [276.03 µs 276.49 µs 277.07 µs]  [-76.927% -76.629% -76.256%]
combinations3/nth/4    [1.2288 ms 1.2343 ms 1.2416 ms]  [306.15 µs 306.67 µs 307.22 µs]  [-75.649% -75.488% -75.343%]
combinations4/nth/4    [1.2963 ms 1.3016 ms 1.3073 ms]  [335.58 µs 337.01 µs 339.08 µs]  [-74.351% -73.952% -73.533%]
-89% allocations for:
combinations1/nth/8    [1.2747 ms 1.2808 ms 1.2862 ms]  [209.95 µs 210.26 µs 210.60 µs]  [-83.432% -83.218% -82.982%]
combinations2/nth/8    [1.1510 ms 1.1534 ms 1.1559 ms]  [191.80 µs 192.31 µs 192.92 µs]  [-83.572% -83.443% -83.343%]
combinations3/nth/8    [1.2398 ms 1.2568 ms 1.2758 ms]  [204.40 µs 204.66 µs 204.96 µs]  [-83.773% -83.538% -83.274%]
combinations4/nth/8    [1.2997 ms 1.3029 ms 1.3062 ms]  [224.81 µs 225.83 µs 227.24 µs]  [-82.748% -82.660% -82.569%]

Benchmark changes closely follow allocations changes.
Nothing surprising here (heap allocations are the bottleneck) but it's definitely a better description than the global -65% the previous benchmark gave us.

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Could we argue similarly for start? (I.e. should we export for each value of start separately?)

@Philippe-Cholet
Copy link
Member Author

Philippe-Cholet commented Apr 16, 2024

@phimuemue I'm not sure about start.

Build (not run) those benchmarks was already slow (~2 minutes) and is probably a bit slower now.
So slow 😁 that I usually comment out iterators I don't want to benchmark before I run the command. Then build is fast.
If we move start from the benchmark to the inputs (where n is now), we end up with even more benchmarks (slower to build and to run).
I'm not so sure consider start is meaningful at all, inside or outside. It felt like a good idea at the time but is it? We have some iterators like combinations that lazily load things at the start but is it worth it enough to consider those starts?

@phimuemue
Copy link
Member

phimuemue commented Apr 16, 2024

Build (not run) those benchmarks was already slow (~2 minutes) and is probably a bit slower now.

Sometime ago I tried changing bench_specializations so that it accepts one benchmark only. On my machine, this cut down compile times by 40%.

When I want to benchmark only a few iterators, I don't use iproduct and end up with an annoying and recurring warning that this simple fix solves.
@Philippe-Cholet
Copy link
Member Author

I found some macro magic trick to select some of the benchmarks (by prefix names with a lifetime (dirty trick)) and reduce build times.
But it complexifies the macro so much that I rather occasionally and temporarily comment out parts of the declaration than update such trick when needed in the future.

@phimuemue What do you think about those starts? Should I delete them or do we merge this as is?

@phimuemue
Copy link
Member

I would leave it in. I think the reason is that iterators may behave differently after they've been nexted several times (if they are similar to Chain e.g.).

We can remove it later once we are sure that it's not needed.

@Philippe-Cholet Philippe-Cholet added this pull request to the merge queue Apr 16, 2024
Merged via the queue into rust-itertools:master with commit ed695af Apr 16, 2024
13 checks passed
@Philippe-Cholet Philippe-Cholet deleted the benchmark-nth-with-inputs branch April 16, 2024 17:49
@kinto-b
Copy link
Contributor

kinto-b commented Apr 16, 2024

@Philippe-Cholet Ah thanks for sharing that. I am a bit confused by the fact that the 'after' times get smaller and smaller for larger and larger n. I'd have expected .nth(8) to be slower than .nth(1) no matter what. Or have I misunderstood what the times represent?

@Philippe-Cholet
Copy link
Member Author

@kinto-b The benchmark roughly measures while let Some(_x) = it.nth(n) {} (and some start shenanigans you can ignore). With larger n, less vectors are created by nth(n) over the entire iteration, roughly iterator_length / (n + 1).
Said differently, .nth(0)s creates every vector, .nth(1)s creates half, .nth(8)s creates 1/9 of them.
Less vectors created leads to smaller times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants