microoptimization on rand(::AliasTable) #1831

adienes · 2024-01-27T15:49:02Z

doesn't necessary improve all the CI benchmarks, but I think those are kind of artificial. on all of my actual use cases I've tried, it's faster to use an ifelse and omit the @inbounds

status quo:

julia> @benchmark rand($rng, s) setup=(s = AliasTable(normalize(rand(10), 1)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):   7.132 ns … 33.909 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.679 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.606 ns ±  1.195 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              ▁ ▂▂ ▃▄▆▁▇▆▂▇▇▂▇█▃▆▅▅ ▃▃         
  ▁▁▁▁▁▁▁▁▁▁▂▂▂▁▂▂▂▃▃▃▄▄▃▄▆▆▅▇█▆██▇███████████████████▆██▄▅▅▃ ▄
  7.13 ns         Histogram: frequency by time        13.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand($rng, s) setup=(s = AliasTable(normalize(rand(100), 1)))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  11.125 ns … 28.416 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.083 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.056 ns ±  0.507 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                  ▂    ▁ █▁▁▁▁ ▇               
  ▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▅▄▅▅▅▆█▇▇▇██████████▇▇▆▆▆▅▇▄▃▃▃▃▃ ▄
  11.1 ns         Histogram: frequency by time          14 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand($rng, s) setup=(s = AliasTable(normalize(rand(100_000), 1)))
BenchmarkTools.Trial: 2899 samples with 1000 evaluations.
 Range (min … max):  12.042 ns … 49.917 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.917 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.013 ns ±  1.244 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                        ▂▃ ▅▇ ▇█▇ ▆▆ █▆ ▅▂ ▃                   
  ▂▁▂▁▁▂▁▂▃▁▂▄▁▄▄▁▆▆▁▇█▁██▁██▁███▁██▁██▁██▁██▁▇▆▁▅▄▁▅▃▁▃▃▁▂▂▂ ▄
  12 ns           Histogram: frequency by time        13.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

PR:

julia> @benchmark rand($rng, s) setup=(s = AliasTable(normalize(rand(10), 1)))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  5.458 ns … 41.708 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.583 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.612 ns ±  0.571 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  ▂       █▁       ▂                          
  ▂▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁█▁▁▁▁▁▁▁██▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▂ ▂
  5.46 ns        Histogram: frequency by time        5.75 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand($rng, s) setup=(s = AliasTable(normalize(rand(100), 1)))

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  5.500 ns … 23.375 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.584 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.621 ns ±  0.487 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     █        ▂                               
  ▂▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▂ ▂
  5.5 ns         Histogram: frequency by time        5.75 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand($rng, s) setup=(s = AliasTable(normalize(rand(100_000), 1)))
BenchmarkTools.Trial: 2874 samples with 1000 evaluations.
 Range (min … max):  6.000 ns … 42.375 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.125 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.140 ns ±  0.764 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  █       ▇                                   
  ▂▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▂ ▂
  6 ns           Histogram: frequency by time        6.29 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

codecov-commenter · 2024-01-27T16:05:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c1705a3) 85.94% compared to head (af22151) 85.94%.

❗ Current head af22151 differs from pull request most recent head 7d8d6d9. Consider uploading reports for the commit 7d8d6d9 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1831      +/-   ##
==========================================
- Coverage   85.94%   85.94%   -0.01%     
==========================================
  Files         144      144              
  Lines        8658     8656       -2     
==========================================
- Hits         7441     7439       -2     
  Misses       1217     1217

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

devmotion · 2024-01-31T20:37:58Z

From the docs of ifelse:

In some cases, using `ifelse` instead of an `if` statement can eliminate the branch in generated code and provide higher performance in tight loops.

Maybe something along these lines is the reason for the performance improvements? Did you compare the output of @code_llvm?

adienes · 2024-01-31T20:53:29Z

that's certainly plausible. here is the output of @code_llvm on my computer. left is status quo, right is PR

I do not really know how to interpret it though :)

Zentrik · 2024-02-01T17:35:24Z

It looks like the ifelse does remove a branch in the LLVM IR however the lack of @inbounds adds some branching (probably cheaper as we never expect for it to be out of bounds). I suspect ifelse and @inbounds would produce the fastest result.

adienes · 2024-02-01T18:47:17Z

I did try with @inbounds here and it is actually ever-so-slightly slower (but basically equivalent) on my machine. Given the performance equivalence, I defaulted to the "safer" option.

however, I don't know if that result is cpu-specific or not though. is it possible @inbounds would have a bigger impact on a different architecture?

Zentrik · 2024-02-01T18:49:52Z

That seems plausible to me.

I imagine the CPU is able to eliminate the cost of bounds checking whilst benchmarking due to branch prediction.

adienes · 2024-02-01T19:12:53Z

makes sense. happy to add back in the @inbounds

anything else needed? I can do more careful benchmarking across a range of inputs, but tbh this seems like a pretty quick win

devmotion · 2024-02-01T21:59:11Z

IMO we should only use @inbounds if it is checked that the indices are valid - but as the function is written currently i is only guaranteed to be a valid index of s.alias, not of s.accept.

adienes · 2024-02-01T22:06:21Z

that's true --- personally I don't mind omitting the @inbounds, but just to play devil's advocate, couldn't a new(probs::AbstractVector) inner constructor be added to AliasTable to enforce the only the constructor AliasTable(probs) is valid, thus alias and accept are guaranteed to have the same indices?

devmotion · 2024-02-01T22:08:58Z

It would definitely be less likely to run into BoundsErrors, but even with an inner constructor you are allowed to push! to s.alias and/or s.accept, so it would not be guaranteed that the indices are valid.

adienes · 2024-02-05T17:32:51Z

makes sense to me --- so this PR then improves safety as well as performance!

src/samplers/aliastable.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

devmotion

Thank you!

vancleve · 2024-03-04T18:38:44Z

cool, I had thought that @inbounds would improve performance, but maybe not significantly.

Seems I should close JuliaStats/StatsBase.jl#630 then?

microoptimization on rand(::AliasTable)

af22151

adienes mentioned this pull request Feb 1, 2024

alias_sample! can be faster JuliaStats/StatsBase.jl#916

Closed

devmotion reviewed Feb 5, 2024

View reviewed changes

src/samplers/aliastable.jl Show resolved Hide resolved

Update src/samplers/aliastable.jl

7d8d6d9

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

devmotion approved these changes Feb 5, 2024

View reviewed changes

devmotion merged commit 3d71a2b into JuliaStats:master Feb 5, 2024
11 of 14 checks passed

LilithHafner mentioned this pull request Apr 8, 2024

Use a faster implementation of AliasTables #1848

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

microoptimization on rand(::AliasTable) #1831

microoptimization on rand(::AliasTable) #1831

adienes commented Jan 27, 2024 •

edited

Loading

codecov-commenter commented Jan 27, 2024 •

edited

Loading

devmotion commented Jan 31, 2024

adienes commented Jan 31, 2024

Zentrik commented Feb 1, 2024 •

edited

Loading

adienes commented Feb 1, 2024

Zentrik commented Feb 1, 2024 •

edited

Loading

adienes commented Feb 1, 2024

devmotion commented Feb 1, 2024

adienes commented Feb 1, 2024

devmotion commented Feb 1, 2024

adienes commented Feb 5, 2024

devmotion left a comment

vancleve commented Mar 4, 2024

microoptimization on rand(::AliasTable) #1831

microoptimization on rand(::AliasTable) #1831

Conversation

adienes commented Jan 27, 2024 • edited Loading

codecov-commenter commented Jan 27, 2024 • edited Loading

Codecov Report

devmotion commented Jan 31, 2024

adienes commented Jan 31, 2024

Zentrik commented Feb 1, 2024 • edited Loading

adienes commented Feb 1, 2024

Zentrik commented Feb 1, 2024 • edited Loading

adienes commented Feb 1, 2024

devmotion commented Feb 1, 2024

adienes commented Feb 1, 2024

devmotion commented Feb 1, 2024

adienes commented Feb 5, 2024

devmotion left a comment

Choose a reason for hiding this comment

vancleve commented Mar 4, 2024

adienes commented Jan 27, 2024 •

edited

Loading

codecov-commenter commented Jan 27, 2024 •

edited

Loading

Zentrik commented Feb 1, 2024 •

edited

Loading

Zentrik commented Feb 1, 2024 •

edited

Loading