Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal (4): Illegal instruction, related to Julia 1.8 #47685

Closed
dehann opened this issue Nov 23, 2022 · 9 comments
Closed

signal (4): Illegal instruction, related to Julia 1.8 #47685

dehann opened this issue Nov 23, 2022 · 9 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@dehann
Copy link
Contributor

dehann commented Nov 23, 2022

Hi, tried to mention in a related and recently closed issue, but was likely missed. I'm still getting a similar Illegal instruction error on Julia 1.8.3 (after the #46882 fix).

The first line is the last printout to come from Julia-land before dropping down with the Signal (4) Illegal instruction:

WHAT IS GOING ON
Unreachable reached at 0x7f13965605ec

signal (4): Illegal instruction
in expression starting at REPL[10]:1
_writeG2oVertexes at /home/dehann/.julia/dev/RoME/src/services/g2oParser.jl:289
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
jl_apply at /home/dehann/software/julia/src/julia.h:1839 [inlined]
do_call at /home/dehann/software/julia/src/interpreter.c:126
eval_value at /home/dehann/software/julia/src/interpreter.c:215
eval_stmt_value at /home/dehann/software/julia/src/interpreter.c:166 [inlined]
eval_body at /home/dehann/software/julia/src/interpreter.c:612
jl_interpret_toplevel_thunk at /home/dehann/software/julia/src/interpreter.c:750
jl_toplevel_eval_flex at /home/dehann/software/julia/src/toplevel.c:906
jl_toplevel_eval_flex at /home/dehann/software/julia/src/toplevel.c:850
ijl_toplevel_eval_in at /home/dehann/software/julia/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_63686 at /home/dehann/software/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
#967 at ./client.jl:419
jfptr_YY.967_57682 at /home/dehann/software/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
jl_apply at /home/dehann/software/julia/src/julia.h:1839 [inlined]
jl_f__call_latest at /home/dehann/software/julia/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_29970 at /home/dehann/software/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
jl_apply at /home/dehann/software/julia/src/julia.h:1839 [inlined]
true_main at /home/dehann/software/julia/src/jlapi.c:575
jl_repl_entrypoint at /home/dehann/software/julia/src/jlapi.c:719
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at julia (unknown line)
Allocations: 166096316 (Pool: 166040379; Big: 55937); GC: 71
Illegal instruction (core dumped)

The code is a simple case of dispatch:

function _writeG2oLine(::Pose3, io, dfg::AbstractDFG, label, i, solveKey)
  println("WHAT IS GOING ON")
  return nothing
end

function _writeG2oVertexes(io,  dfg,  varIntLabel,  solveKey)
  for (label,i) in pairs(varIntLabel)
    vartype = getVariableType(dfg, label)
    _writeG2oLine(vartype, io, dfg, label, i, solveKey)
    println("NEVER SEEN")
  end
  return nothing
end

Notice how the inner function print statement runs, but then this big error on the return nothing statement. Execution never makes it to the later NEVER SEEN print line. I'm a little confused.


EDIT:

$ julia -O3
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.3 (2022-11-14)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 6 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 6

Also note Julia v1.8.3 I'm using here is freshly compiled from source. I was hoping 1.8.3 would fix the issue, but doesn't seem like it.


EDIT2: this was actually quite a tiring exercise, and ended up going with the following workaround:

# dispatching to a function like this does not work in Julia 1.8 in this case:
somefnc(::MyType, args...) = ...

# using workaround
fixdfnc = getfield(MyModule, Symbol(:somefnc, typeof(mytype).name.name))
fixdfnc(args...)

Also, not sure if this is related, but I found another dispatch issue on Julia 1.8: when trying to add a dispatch from a downstream module, the multiple dispatch breaks down in some cases (used to work before 1.8). For example, module First fnc(::MyType) end; and then overloading fails: module Second import First.fnc; First.fnc(::AnotherType) end.

Originally posted by @dehann in #46871 (comment)

@KristofferC
Copy link
Sponsor Member

KristofferC commented Nov 23, 2022

What is Pose3 etc? How is the error reproduced?

@giordano giordano added the needs more info Clarification or a reproducible example is required label Nov 23, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 24, 2022

Despite the poor error report, I confirmed that running that packages tests failed in v1.8.3 and passes on master, so it might be fixed by one of these:

$ git log --oneline --cherry-pick  origin/release-1.8...HEAD -- src/subtype.c
51d1a56229 Make sure `UnionAll` is handled by `subtype_unionall` (#46978)
a9d3f7bb35 Propagate var's offset to its `ub` to avoid invalid bounds setting. (#46603)
24f53313ee Avoid erasing `envout` during `exists_subtype` if there's remaining `Runions`. (#46302)
2abedf6d4d Avoid set var's lb if intersect return a Vararg with free length. (Null or a local type var)
1268582291 Make `bound_var_below` return `NULL` if the input typevar is not valid.
b7f47d5394 Avoid set `var`'s bounds if `offset != 0`
38829a0ac4 Always return the shorter `Vararg` length.
f29a013366 Add tuple length offset when we intersect 2 `Vararg`'s length.
e6d2624adb Skip subtype check if `intersect_invariant` calls `set_vat_to_const`.
01a4a30a68 Avoid setting `offset` when we intersect `Vararg`'s eltype.

If you checkout RoME 4419a39acb8162b706d2ad3bcc2f4ebf3b5d3c14 and reapply the bad code, this could perhaps be bisected for backporting with a git bisect start HEAD origin/release-1.8 -- src/subtype.c && git bisect run sh -c "make clean && make -j8 && !./julia --project=RoME -e 'using Pkg; Pkg.test(\"RoME\")'", so I may give that a shot

diff --git a/src/services/g2oParser.jl b/src/services/g2oParser.jl
index ae94713..8215f69 100644
--- a/src/services/g2oParser.jl
+++ b/src/services/g2oParser.jl
@@ -246,22 +246,22 @@ function stringG2o!(dfg::AbstractDFG,
   error("unknown factor type $fnc")
 end
 
-# _writeG2oLine(
-#   _, 
-#   io,
-#   dfg::AbstractDFG, 
-#   label, 
-#   i,
-#   solveKey
-# ) = (close(io); error("exportG2o does not support $vartype, open an issue if you would like support"))
-
-function _writeG2oLinePose2(io, dfg::AbstractDFG, label::Symbol, i::Int, solveKey::Symbol)
+_writeG2oLine(
+  _, 
+  io,
+  dfg::AbstractDFG, 
+  label, 
+  i,
+  solveKey
+) = (close(io); error("exportG2o does not support $vartype, open an issue if you would like support"))
+
+function _writeG2oLine(::Pose2, io, dfg::AbstractDFG, label::Symbol, i::Int, solveKey::Symbol)
   # println("trying VERTEX_SE2")
   (x,y,θ) = getPPESuggested(dfg, label, solveKey)
   write(io, "VERTEX_SE2 $i $x $y $θ\n")
 end
 
-function _writeG2oLinePose3(io, dfg::AbstractDFG, label::Symbol, i::Int, solveKey::Symbol)
+function _writeG2oLine(::Pose3, io, dfg::AbstractDFG, label::Symbol, i::Int, solveKey::Symbol)
   # println("WHAT IS GOING ON")
   Xc = getPPESuggested(dfg, label, solveKey)
   p = getPoint(Pose3, Xc)
@@ -274,10 +274,7 @@ end
 
 function _doG2oLoop(io, dfg, label, i, solveKey)
   vartype = getVariableType(dfg, label)
-  typename = string(typeof(vartype).name.name)
-  # FIXME, HACK, WTF https://github.com/JuliaLang/julia/issues/46871#issuecomment-1318035929
-  fnc = getfield(RoME, Symbol(:_writeG2oLine, typename))
-  fnc(io, dfg, label, i, solveKey)
+  _writeG2oLine(vartype, io, dfg, label, i, solveKey)
 end
 
 function _writeG2oVertexes(

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 25, 2022

tried to bisect, but I don't think this seems right :/

$ JULIA_PKG_PRECOMPILE_AUTO=1 git bisect run bash -c "rm -rf usr && make -j || exit 125; ./julia --project=RoME -e 'using Pkg; Pkg.test(\"RoME\")'"

$ git bisect bad
4857cd2e60de994074a92be0c2ed3510f360f25c is the first bad commit
commit 4857cd2e60de994074a92be0c2ed3510f360f25c
Author: Philip Tellis <philip.tellis@gmail.com>
Date:   Mon Feb 28 15:52:39 2022 -0500

    Fix hyperlinks in 1.8 news (#44371)
    
    (cherry-picked from 675911a6ea263a4b0ec3654df9c84aec7285f027)

 NEWS.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

$ git bisect log
# bad: [0434deb161e17103eabdd7d82b17a1cd6b410572] set VERSION to 1.8.3 (#47556)
# good: [7a1c20e6dea50291b364452996d3d4d71a6133dc] Eagerly do boundscheck when indexing CartesianIndices with CartesianIndices (#42235)
git bisect start 'origin/release-1.8' '7a1c20e6dea50291b364452996d3d4d71a6133dc'
# bad: [853dff91907161f0a925ade1a6eaac86f4884218] fixup! fixup! Update LLVM to 13.0.1+1
git bisect bad 853dff91907161f0a925ade1a6eaac86f4884218
# bad: [94b7154a089b17bca235bc346a8b8d41f4f71e79] inference: override `InterConditional` result with `Const` carefully (#44668)
git bisect bad 94b7154a089b17bca235bc346a8b8d41f4f71e79
# bad: [ed0eb780ca6d0ef4ac233f8adeff7d8510b23275] Fix htable cleanup (#44446)
git bisect bad ed0eb780ca6d0ef4ac233f8adeff7d8510b23275
# good: [e23618dc347071b4a62ce4d4b4e1b4a8f2d53281] fix `BLAS.spr!` docstring (#44303)
git bisect good e23618dc347071b4a62ce4d4b4e1b4a8f2d53281
# bad: [acbb1f6f74d3df27344a1a40468ee6ffae3f646c] [OpenBLAS_jll] Update to v0.3.20 (#44321)
git bisect bad acbb1f6f74d3df27344a1a40468ee6ffae3f646c
# good: [f9c76f0bcb2a26c8d96d878d1c480bf48b0460a0] Add Pkg 1.8 news (#44370)
git bisect good f9c76f0bcb2a26c8d96d878d1c480bf48b0460a0
# bad: [5d3ebe0fa574b4ff47ee6a939f064331bc398f77] Update LBT to 5.0.1 for source build (#44258)
git bisect bad 5d3ebe0fa574b4ff47ee6a939f064331bc398f77
# bad: [3d8dadb9d8f7d8df1735337313970d02d4e0bfa9] Add NEWS on precompilation (#44325)
git bisect bad 3d8dadb9d8f7d8df1735337313970d02d4e0bfa9
# bad: [4857cd2e60de994074a92be0c2ed3510f360f25c] Fix hyperlinks in 1.8 news (#44371)
git bisect bad 4857cd2e60de994074a92be0c2ed3510f360f25c
# first bad commit: [4857cd2e60de994074a92be0c2ed3510f360f25c] Fix hyperlinks in 1.8 news (#44371)

@qiaojunfeng
Copy link

I have also seen such kind of crash in my code, with Julia 1.8.3.
But I don't really understand and cannot produce an MWE from my code :(

Not sure if this is helpful: if I ran the code with the vscode debugger, and set a breakpoint before the line where it was crashed, and step over line by line, the code finished normally. Otherwise, if I run the script in CLI or copy-paste in REPL, it crashed consistently.

Unreachable reached at 0x7fb9c317c16d

signal (4): Illegal instruction
in expression starting at ... 
...
unknown function (ip: 0x7fb9c317c1a2)
_jl_invoke at /cache/build/default-amdci5-6/julialang/julia-release-1-dot-8/src/gf.c:2365 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-6/julialang/julia-release-1-dot-8/src/gf.c:2547
jl_apply at /cache/build/default-amdci5-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
do_call at /cache/build/default-amdci5-6/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-6/julialang/julia-release-1-dot-8/src/interpreter.c:215
...

@dehann
Copy link
Contributor Author

dehann commented Nov 26, 2022

Despite the poor error report
If you checkout RoME 4419a39 ... and reapply the bad code

Hi @vtjnash , yeah apologies for poor error report. It's been quite difficult to get down to a good MWE. The RoME test might also be hard to work with in this case (it's a package that I help maintain). I'd rather try find a better example than have you work through the RoME tests -- it has a lot of numerical tests.

Let me understand better where you think the problem might be in Julia and I will try get a better MWE. The comment just above from qiaojunfeng helps confirm that there may actually be a bug here. Thanks for the pointers above!

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 28, 2022

Looked a lot closer, and realized the bisect probably actually landed on #43990, but accidentally blamed the next commit (it reproduces most of the time, but not 100% reliablely). I thought this type of problem would get fixed by #46171 or #46375, but those were backported. This problem is what motivated the complete rewrite of this algorithm in #46920 (which was not backported, since it would be slightly difficult due to the addition of invoke-specific optimizations that are only on master). Looks like this is probably fixed on master though, presumably by that PR.

-- reference notes to follow

julia> using RoME

julia> Base.return_types(RoME.getVariableType, (RoME.DistributedFactorGraphs.GraphsDFGs.GraphsDFG{IncrementalInference.SolverParams, RoME.DistributedFactorGraphs.DFGVariable{T} where T<:RoME.DistributedFactorGraphs.InferenceVariable, RoME.DistributedFactorGraphs.DFGFactor{T, N} where N where T}, Symbol))
1-element Vector{Any}:                            
 Union{Circular, Position} # bad
# want Any

julia> which(RoME.getVariableType, (RoME.DistributedFactorGraphs.GraphsDFGs.GraphsDFG{IncrementalInference.SolverParams, RoME.DistributedFactorGraphs.DFGVariable, RoME.DistributedFactorGraphs.DFGFactor}, Symbol)).specializations[1]
MethodInstance for DistributedFactorGraphs.getVariableType(::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, ::Symbol).specializations[1]
MethodInstance for DistributedFactorGraphs.getVariableType(::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, ::Symbol) # desired

julia> ans.cache                                                                                                                                                                                                                                                                                                                                      
Core.CodeInstance(0x0000000000008326, 0x000000000000832a, Union{Circular, Position}, #undef, nothing, 0x00000509, 0x00000509, nothing, false, false, 0x01, Ptr{Nothing} @0, Ptr{Nothing} @0) # good

Core.CodeInstance(0x000000000000832b, 0xffffffffffffffff, Any, #undef, nothing, 0x00000109, 0x00000109, nothing, false, false, 0x01, Ptr{Nothing} @0, Ptr{Nothing} @0) # good

Core.CodeInstance(0x0000000000007fb1, 0xffffffffffffffff, Union{Circular, Position}, #undef, nothing, 0x000001aa, 0x000001aa, nothing, false, false, Ptr{Nothing} @0, Ptr{Nothing} @0, 0x01) # bad

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 28, 2022

It is also appears to be somewhat traceable to IncrementalInference which explicitly precompiles this, then intentionally invalidates all of that work. That is not a good design plan, though Julia is supposed to handle it without crashing like this.

@qiaojunfeng
Copy link

qiaojunfeng commented Nov 28, 2022

Thank you @vtjnash!
I compiled the master branch c8ea33d and my code works fine now.

One quick question: the version of master branch is 1.10.0-DEV.62, would it be possible that these commits are backported to a 1.8.x version? Or we have to wait until 1.10?

qiaojunfeng added a commit to qiaojunfeng/Wannier.jl that referenced this issue Nov 29, 2022
Note due to bugs in Julia v1.8.3, the tests fail.
JuliaLang/julia#47685

The tests work fine with a master branch julia
JuliaLang/julia@c8ea33d
@vtjnash vtjnash removed the needs more info Clarification or a reproducible example is required label Nov 29, 2022
@brenhinkeller brenhinkeller added the bug Indicates an unexpected problem or unintended behavior label Dec 1, 2022
qiaojunfeng added a commit to qiaojunfeng/Wannier.jl that referenced this issue Jan 10, 2023
julia >= 1.8.4 fixs the crash due to
JuliaLang/julia#47685
@LilithHafner
Copy link
Member

Should be fixed by #47741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

7 participants