-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic test failure in LinearAlgebra/matmul.jl #44635
Comments
Looks like we have an rr trace there, so should be quite debuggable. |
Just to clarify, what I'm seeing in the backtrace is:
Do we think that the 4->3 call is wrong because it should have dispatched to the one in mutlidimensional.jl instead? |
Just on a regular master build, that's the optimized IR for that call, which all seems correct, so I guess we just need to look at the rr trace and see what's different about it.
|
Hmm, I would have expected the test failure to trigger the breakpoint here: https://github.com/JuliaLang/julia/blame/master/stdlib/Test/src/Test.jl#L655, but I don't see it in the trace. |
Ah, that's because it was inside a |
I'm a bit short on time, so unless somebody else wants to go digging, I say let's get that merged and wait for another rr trace that includes that change, so it'll be easier to find. |
We have this captured in https://buildkite.com/julialang/julia-master/builds/10284#c8607f85-b173-49f9-a0fe-c1d1586d6ccf
|
So the last generic call before the error compiles something. The arguments are:
I don't really know how |
Hmm:
|
Ah, nevermind. That's just a printing bug in
|
In #44635, we observe that occasionally a call to `view(::SubArray, ::Colon, ...)` dispatches to the wrong function. The post-inlining IR is in relevant part: ``` │ │ %8 = (isa)(I, Tuple{Colon, UnitRange{Int64}, SubArray{Int64, 2, UnitRange{Int64}, Tuple{Matrix{Int64}}, false}})::Bool └───│ goto #3 if not %8 2 ──│ %10 = π (I, Tuple{Colon, UnitRange{Int64}, SubArray{Int64, 2, UnitRange{Int64}, Tuple{Matrix{Int64}}, false}}) │ │ @ indices.jl:324 within `to_indices` @ multidimensional.jl:859 │ │┌ @ multidimensional.jl:864 within `uncolon` │ ││┌ @ indices.jl:351 within `Slice` @ indices.jl:351 │ │││ %11 = %new(Base.Slice{Base.OneTo{Int64}}, %7)::Base.Slice{Base.OneTo{Int64}} │ │└└ │ │┌ @ essentials.jl:251 within `tail` │ ││ %12 = Core.getfield(%10, 2)::UnitRange{Int64} │ ││ %13 = Core.getfield(%10, 3)::SubArray{Int64, 2, UnitRange{Int64}, Tuple{Matrix{Int64}}, false} │ │└ │ │ @ indices.jl:324 within `to_indices` └───│ goto #5 │ @ indices.jl:324 within `to_indices` @ indices.jl:333 │┌ @ tuple.jl:29 within `getindex` 3 ──││ %15 = Base.getfield(I, 1, true)::Function │ │└ │ │ invoke Base.to_index(A::SubArray{Int64, 3, Array{Int64, 3}, Tuple{Vector{Int64}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, false}, %15::Function)::Union{} ``` Here we expect the `isa` at `%8` to always be [1]. However, we seemingly observe the result that the branch is not taken and we instead end up in the fallback `to_index`, which (correctly) complains that the colon should have been dereferenced to an index. After some investigation of the relevant rr trace, what turns out to happen here is that the va tuple we compute in codegen gets garbage collected before the call to `emit_isa`, causing a use-after-free read, which happens to make `emit_isa` think that the isa condition is impossible, causing it to fold the branch away. The fix is to simply add the relevant GC root. It's a bit unfortunate that this wasn't caught by the GC verifier. It would have in principle been capable of doing so, but it is currently disabled for C++ sources. It would be worth revisiting this in the future to see if it can't be made to work. Fixes #44635. [1] The specialization heuristics decided to widen `Colon` to `Function`, which doesn't make much sense here, but regardless, it shouldn't crash.
In #44635, we observe that occasionally a call to `view(::SubArray, ::Colon, ...)` dispatches to the wrong function. The post-inlining IR is in relevant part: ``` │ │ %8 = (isa)(I, Tuple{Colon, UnitRange{Int64}, SubArray{Int64, 2, UnitRange{Int64}, Tuple{Matrix{Int64}}, false}})::Bool └───│ goto #3 if not %8 2 ──│ %10 = π (I, Tuple{Colon, UnitRange{Int64}, SubArray{Int64, 2, UnitRange{Int64}, Tuple{Matrix{Int64}}, false}}) │ │ @ indices.jl:324 within `to_indices` @ multidimensional.jl:859 │ │┌ @ multidimensional.jl:864 within `uncolon` │ ││┌ @ indices.jl:351 within `Slice` @ indices.jl:351 │ │││ %11 = %new(Base.Slice{Base.OneTo{Int64}}, %7)::Base.Slice{Base.OneTo{Int64}} │ │└└ │ │┌ @ essentials.jl:251 within `tail` │ ││ %12 = Core.getfield(%10, 2)::UnitRange{Int64} │ ││ %13 = Core.getfield(%10, 3)::SubArray{Int64, 2, UnitRange{Int64}, Tuple{Matrix{Int64}}, false} │ │└ │ │ @ indices.jl:324 within `to_indices` └───│ goto #5 │ @ indices.jl:324 within `to_indices` @ indices.jl:333 │┌ @ tuple.jl:29 within `getindex` 3 ──││ %15 = Base.getfield(I, 1, true)::Function │ │└ │ │ invoke Base.to_index(A::SubArray{Int64, 3, Array{Int64, 3}, Tuple{Vector{Int64}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, false}, %15::Function)::Union{} ``` Here we expect the `isa` at `%8` to always be [1]. However, we seemingly observe the result that the branch is not taken and we instead end up in the fallback `to_index`, which (correctly) complains that the colon should have been dereferenced to an index. After some investigation of the relevant rr trace, what turns out to happen here is that the va tuple we compute in codegen gets garbage collected before the call to `emit_isa`, causing a use-after-free read, which happens to make `emit_isa` think that the isa condition is impossible, causing it to fold the branch away. The fix is to simply add the relevant GC root. It's a bit unfortunate that this wasn't caught by the GC verifier. It would have in principle been capable of doing so, but it is currently disabled for C++ sources. It would be worth revisiting this in the future to see if it can't be made to work. Fixes #44635. [1] The specialization heuristics decided to widen `Colon` to `Function`, which doesn't make much sense here, but regardless, it shouldn't crash.
We sporadically (<5%) get a test failure (example: https://buildkite.com/julialang/julia-master/builds/10097#eee6dda2-d1bf-4118-aa04-5056c642eb4f) for which the first stacktrace is
I was going to guess is that it depends on what other tests might have run on the same node, but in this case it appears to be the first test run on that node. I am therefore at a bit of a loss.
The text was updated successfully, but these errors were encountered: