Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test against Enzyme #318

Merged
merged 22 commits into from
Jul 11, 2024
Merged

Test against Enzyme #318

merged 22 commits into from
Jul 11, 2024

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented Jun 26, 2024

Currently failing, using this to run CI and catch issues. Uses code from #240.

@mhauru mhauru self-assigned this Jun 26, 2024
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@mhauru
Copy link
Member Author

mhauru commented Jun 26, 2024

CI currently fails with

julia: /buildworker/worker/package_linux64/build/src/llvm-late-gc-lowering.cpp:872: std::vector<int, std::allocator<int> > LateLowerGCFrame::NumberAllBase(State&, llvm::Value*): Assertion `Tracked.size() == BaseNumbers.size()' failed.

Running the tests locally I also see the following

  Expression: ≈(Enzyme.gradient(Enzyme.Reverse, f, x), finitediff, rtol = rtol, atol = atol)
  Enzyme compilation failed due to illegal type analysis.
  Current scope:
  ; Function Attrs: mustprogress willreturn
  define internal fastcc "enzyme_type"="{[-1]:Float@double}" double @preprocess_julia__solve__70_20557({ { [3 x double] }, [4 x double], { i64, i8 } } addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(72) "enzyme_type"="{[-1]:Pointer, [-1,0]:Float@double, [-1,8]:Float@double, [-1,16]:Float@double, [-1,24]:Float@double, [-1,32]:Float@double, [-1,40]:Float@double, [-1,48]:Float@double, [-1,56]:Integer, [-1,57]:Integer, [-1,58]:Integer, [-1,59]:Integer, [-1,60]:Integer, [-1,61]:Integer, [-1,62]:Integer, [-1,63]:Integer, [-1,64]:Integer}" "enzymejl_parmtype"="4511001616" "enzymejl_parmtype_ref"="1" %0) unnamed_addr #99 !dbg !10218 {
  [blahblahblah]

This is with Enzyme 0.12.19. @wsmoses, could you let me know if these are issues familiar to you, or if I should work to make a minimal reproduction of them?

@wsmoses
Copy link
Collaborator

wsmoses commented Jun 26, 2024

What is the Julia version of the first error. That looks like a Julia compiler bug that was subsequently fixed

@wsmoses
Copy link
Collaborator

wsmoses commented Jun 26, 2024

The latter issue is not known please open an issue with a mwe

@mhauru
Copy link
Member Author

mhauru commented Jun 26, 2024

The first one is on Julia 1.6, so the compiler bug it probably is then. I'll make CI run only on latest release and retry, and try to make an MWE of the latter.

@mhauru
Copy link
Member Author

mhauru commented Jun 26, 2024

Illegal type analysis error issue here: EnzymeAD/Enzyme.jl#1573

@willtebbutt willtebbutt mentioned this pull request Jun 26, 2024
@mhauru
Copy link
Member Author

mhauru commented Jun 26, 2024

The Assertion failed: (vec.back() == currentBlock) issue: EnzymeAD/Enzyme.jl#1574

@wsmoses
Copy link
Collaborator

wsmoses commented Jun 27, 2024

@mhauru like I mentioned on the closed Enzyme issue, the illegal type analysis is thrown due to the presence of an unknown bithack in Roots.jl , rather than risk an incorrect answer: https://github.com/JuliaMath/Roots.jl/blob/4263c7683423af209af1879bd9cd223b0ba55acd/src/Bracketing/bisection.jl#L135

It looks like you all defined a custom rule for tapir/etc for find_alpha for it, which I presume is a reasonable place for a custom rule for it here

@willtebbutt @mhauru @yebai mind importing the same rule in Enzyme. Feel free to use Enzyme's chain rule import macro if you like: https://github.com/EnzymeAD/Enzyme.jl/blob/c4068fc96a9b574fd08593178611b153cb71ac5b/ext/EnzymeChainRulesCoreExt.jl#L126 If you test with both forward and reverse you'll need to import both forward and reverse rules

@devmotion
Copy link
Member

It looks like you all defined a custom rule for tapir/etc for find_alpha for it, which I presume is a reasonable place for a custom rule for it here

Even if Enzyme would not throw an error, one really should define a custom rule for find_alpha (it's very simple) instead of differentiating through the root finding algorithm.

More generally, similar to the ForwardDiff support IMO it would be good to add support for Enzyme to Roots.jl such that the implicit function theorem is always exploited.

@wsmoses
Copy link
Collaborator

wsmoses commented Jun 27, 2024

Sure, but I definitely appreciate that Enzyme early errors rather than risking bad behavior here!

Since it looks like this repo already adds custom rules for this, can that be added here to start and after tests are validated possibly upstream to roots.jl

mhauru and others added 2 commits June 27, 2024 16:44
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@mhauru
Copy link
Member Author

mhauru commented Jun 27, 2024

Thanks @wsmoses, I added an import of the existing ChainRule for find_alpha. This uncovered a new issue, filed here: EnzymeAD/Enzyme.jl#1577

ext/BijectorsEnzymeExt.jl Outdated Show resolved Hide resolved
ext/BijectorsEnzymeExt.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@@ -3,6 +3,7 @@ const AD = get(ENV, "AD", "All")

function test_ad(f, x, broken=(); rtol=1e-6, atol=1e-6)
finitediff = FiniteDifferences.grad(central_fdm(5, 1), f, x)[1]
et = eltype(finitediff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Shouldn't Enzyme return the correct types automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In forward mode it returns tuples, and if the gradient is empty, the result is a Tuple{}. This resulted in comparing an empty Float64[] to an empty Union{}[], which failed. See EnzymeAD/Enzyme.jl#1584

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the gradient should never be empty? Such a test would be quite useless, so I assume we don't run into this special case here? So maybe a simple collect (without specifying the element type) would be sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a corner case, but we ran into it here, when d==1:

for d in [1, 2, 5]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not testing AD in the case d = 1 - or just checking that the gradient is empty. We already handle this case in a special way in e.g.

test_bijector(b, x; test_not_identity=d != 1, changes_of_variables_test=false)
test_bijector(bvec, x; test_not_identity=d != 1, changes_of_variables_test=false)
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passes now though, and I don't really see a downside to testing it? Good to know for instance that nothing crashes even if you hit this corner case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I also would prefer to not remove the test completely (even though I think it's of very limited use)

just checking that the gradient is empty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the current test is effectively doing, because when d = 1 finitediff returns an empty array. The eltype thing just makes sure that the check becomes Float64[] == Float64[], rather than Union{}[] == Float64[]. We could put in a specific case for d == 1 in the test file, but this seems like more work to me, because you need to make it cater to different AD backends and specify manually that the result should be empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something differently - removing the eltype/collect completely and only add a special case to this weird test of the CorrBijector since we already have special cases for d = 1 there anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's how I understood you, but that would require something like adding another argument to test_ad called expect_empty that would skip comparing to finitediff and instead check that the gradient has length 1 (for all AD backends) and setting that argument to d == 1, which to me seems more complicated, with a bunch of if statements, compared adding the one-liner enforcing eltype. I can do it if you prefer it, I just don't see the benefit.

test/ad/utils.jl Outdated Show resolved Hide resolved
mhauru and others added 2 commits July 8, 2024 18:04
Co-authored-by: Hong Ge <3279477+yebai@users.noreply.github.com>
mhauru and others added 3 commits July 9, 2024 10:18
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@wsmoses
Copy link
Collaborator

wsmoses commented Jul 9, 2024

since it looks like all of this passes, can this get merged? @devmotion @yebai @mhauru

test/ad/utils.jl Outdated
@@ -47,7 +47,7 @@ function test_ad(f, x, broken=(); rtol=1e-6, atol=1e-6)
end
end

if AD == "All" || AD == "Enzyme"
if (AD == "All" || AD == "Enzyme") && VERSION >= v"1.10"
if :EnzymeReverse in broken
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Julia compiler bug on 1.6 only applies to [batched] fwd mode. It shouldn't be hit for reverse mode if you want to losen the tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, changed that. Is forward_broken = :EnzymeForward in broken || :Enzyme in broken || VERSION <= v"1.6" correct, or did the compiler bug get fixed only later?

@mhauru
Copy link
Member Author

mhauru commented Jul 10, 2024

@wsmoses, one of the tests started failing when I enabled reverse mode Enzyme checks for Julia 1.6. Could you please check the CI logs and let me know if this is a known issue with Julia 1.6 and what the appropriate version bound would be: https://github.com/TuringLang/Bijectors.jl/actions/runs/9871431348/job/27259337160?pr=318

@wsmoses
Copy link
Collaborator

wsmoses commented Jul 10, 2024

Hm yeah I haven't seen that one before. Open an issue and will investigate.

In interim, maybe we just enable for 1.10+ for now and update bounds once that's fixed

@yebai
Copy link
Member

yebai commented Jul 10, 2024

@wsmoses When assertation fails (like here), can we throw an error so Julia can handle it, instead of aborting/exiting? In my view, abouting a Julia session should only be used in unrecoverable scenarios. Related: EnzymeAD/Enzyme.jl#1625

@mhauru
Copy link
Member Author

mhauru commented Jul 11, 2024

Reported the latest issue here: EnzymeAD/Enzyme.jl#1629 It gets triggered on Julia 1.6 and 1.7, but not 1.8.

@@ -93,7 +93,7 @@ function transform(t::Transform, x)
res = with_logabsdet_jacobian(t, x)
if res isa ChangesOfVariables.NoLogAbsDetJacobian
error(
"`transform` not implemented for $(typeof(f)); implement `transform` and/or `with_logabsdet_jacobian`.",
"`transform` not implemented for $(typeof(t)); implement `transform` and/or `with_logabsdet_jacobian`.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has nothing to do with this PR, I just spotted the typo while working on this PR and didn't feel like making a separate one character PR.

@mhauru
Copy link
Member Author

mhauru commented Jul 11, 2024

If we are happy with Enzyme only being tested on 1.10, this could be ready to merge. I'm the owner, so someone else needs to review.

@yebai yebai merged commit b79c425 into master Jul 11, 2024
27 checks passed
@yebai yebai deleted the mhauru/enzyme branch July 11, 2024 22:04
@wsmoses
Copy link
Collaborator

wsmoses commented Jul 11, 2024

or so Julia can handle it, instead of aborting/exiting? In my view, abouting a Julia session should only be used in unrecoverabl

Oh yeah totally agreed, PR's welcome!

@yebai
Copy link
Member

yebai commented Jul 12, 2024

or so Julia can handle it, instead of aborting/exiting? In my view, abouting a Julia session should only be used in unrecoverabl

Oh yeah totally agreed, PR's welcome!

@mhauru can you help create an Enzyme issue for this? Thanks!

@mhauru
Copy link
Member Author

mhauru commented Jul 12, 2024

Done: EnzymeAD/Enzyme.jl#1634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants