-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making rand gradients work for more distributions #123
Comments
Ok, I understand this a bit more and was able to get correct I'll make a PR is this sounds like something useful for Turing. |
Great! This sounds definitely useful. It would be even better to add an adjoint for ChainRules instead of Zygote (ChainRules is the new way of defining forward and reverse mode rules for different AD backends and is already used by Zygote). |
I think a more general ChainRules adjoint may be blocked by JuliaDiff/ChainRulesCore.jl#68. The adjoint is peculiar and relies on running AD on the incomplete gamma function, and it looks like there's not currently a way of doing that without assuming a specific AD system. So I think it has to be for a specific package, then it can be generalized once ChainRules supports it. |
Yeah, I've run into this issue before. But if it is only since the implementation
wouldn't it be even better to add the adjoint for the incomplete gamma function to https://github.com/JuliaDiff/ChainRules.jl/blob/master/src/rulesets/packages/SpecialFunctions.jl instead of relying on a specific AD backend? |
Well I'm just learning this stuff, so I'm very open to a better way of handling this. ZygoteRules.@adjoint function Distributions.rand(rng::AbstractRNG, d::Gamma{T}) where {T<:Real}
z = rand(rng, d)
function rand_gamma_pullback(c)
y = z/d.θ
∂α, ∂y = gradient(_gamma_inc_lower, d.α, y)
return (
DoesNotExist(),
(α=(-d.θ*∂α/∂y)*c,
θ=y*c))
end
return z, rand_gamma_pullback
end |
I don't know if this is exactly the same issue; I was trying to use autodiff in an optimizer that uses an objective function that uses the Gamma distribution, but it chokes at
Or is this not expected to work at all? |
It seems this is caused by a call of |
Trying to compute gradients of the
rand
function wrt to parameters for certain distributions will produce incorrect results, because some of these functions use branching or iterated algorithms and AD can't take into account how the parameters affect control flow.A simple demonstration of this is just trying to estimate d/dθ E[x] by estimating E[d/dθ x].
Normal
of course works: d/dμ E[x] = d/dμ μ = 1 and(which works for any values of μ, σ)
Gamma
will not return a gradient for some values, and return incorrect results for others. E.g. d/dα E[x] = d/dα αβ = β, yetBeta
similarly d/dα E[x] = d/dα α/(α+β) = β / (α+β)^2 yetIt's well known that some distributions (e.g. Gamma, Beta, Dirichlet) don't lend themselves easily to this kind of pathwise gradient which makes them infrequently used as surrogate posteriors for VI, but there have been some papers on trying to work around this using numerical approximations and other techniques. See for example:
I'd love to help improve the
rand
situation, but I'm still getting my bearings with this code, so I was hoping for some pointers.My vague thought was that there might be a
TuringGamma
,TuringBeta
, etc that implement alternativerand
functions that are correctly differentiated. Is there a nicer approach, or is this the best option?Second, for distributions where there is no viable way to AD
rand
, is there something better that can be done than report incorrect numbers? Should the remedy be in Distributions?(Related issue is #113)
The text was updated successfully, but these errors were encountered: