-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a faster and safer implementation of alias_sample!
#927
Conversation
ap = Vector{Float64}(undef, n) | ||
alias = Vector{Int}(undef, n) | ||
make_alias_table!(wv, sum(wv), ap, alias) | ||
at = AliasTable(wv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there is no reason to keep make_alias_table!
(defined above) around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the public API of StatsBase, no, there's no reason. However there are some usages downstream of that internal function. Most notably, Distributions.jl, which would break if we remove make_alias_table!
before we merge JuliaStats/Distributions.jl#1848. Once Distributions.jl no longer depends on this, I think it's worth removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems Distributions might be the only more widely used package that relies on these internals? I was aware of potential problems with using an internal function when writing ConsistencyResampling, and would be fine with breaking it (and probably switching to AliasTables
) 😛 But given the impact on Distributions, maybe it would be safer to deprecate make_alias_table!
and remove it in a future breaking release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would even be acceptable to deprecate it and remove it in a future non-breaking release once Distributions has a few releases out that don't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but let's wait a bit so others have time to chime in.
Bump, is there anyone in particular or some channel we should ping for feedback? |
AFAIK there's no standard rule to follow. I just wanted to wait a few days in case someone has additional comments, but it seems nobody objected to the PR. IMO the PR is a clear improvement so I'll merge and leave further improvements or comments for future PRs. |
Safer:
Before
After
Faster (benchmarks from #695 (comment))
Before
After
Closes #630 (AliasTables.jl uses
@inbounds
in sampling and contains a correctness proof that relies only on local information and basic properties of unsigned integers)Closes #916 by making
alias_sample!
much faster than even usingifelse
would. See https://aliastables.lilithhafner.com/dev/#Implementation-details for how.See also: JuliaStats/Distributions.jl#1848
cc @devmotion