Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Illegal instruction error #46871

Closed
gkemlin opened this issue Sep 23, 2022 · 4 comments · Fixed by #46882
Closed

Unexpected Illegal instruction error #46871

gkemlin opened this issue Sep 23, 2022 · 4 comments · Fixed by #46882
Labels
bug Indicates an unexpected problem or unintended behavior types and dispatch Types, subtyping and method dispatch
Milestone

Comments

@gkemlin
Copy link

gkemlin commented Sep 23, 2022

Hi,

since a few days we are struggling with an "illegal instruction error" we don't understand in the DFTK.jl package. Here are the details of what happens:

  • output of versioninfo()
Julia Version 1.8.0
Commit 5544a0fab7 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
  • minimal working example
using Brillouin
using Bravais
using DFTK
using Unitful
using UnitfulAtomic
using StaticArrays

a = 10.26  # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
                   [1 0 1.];
                   [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [ones(3)/8, -ones(3)/8]

model = model_LDA(lattice, atoms, positions)
kcoords, klabels, kpath = DFTK.high_symmetry_kpath(model; kline_density=40u"bohr") # this line causes the illegal instruction
  • stacktrace of the error
Unreachable reached at 0x7f53e7225b24

signal (4): Illegal instruction
in expression starting at REPL[13]:1
#high_symmetry_kpath#701 at /home/kemling/.julia/packages/DFTK/iIhO4/src/postprocess/band_structure.jl:48
high_symmetry_kpath##kw at /home/kemling/.julia/packages/DFTK/iIhO4/src/postprocess/band_structure.jl:16
unknown function (ip: 0x7f53e722617a)
unknown function (ip: 0x7f54797cb1cc)
unknown function (ip: 0x7f54797caa87)
unknown function (ip: 0x7f54797cb8fb)
unknown function (ip: 0x7f54797cc76e)
unknown function (ip: 0x7f54797eae63)
unknown function (ip: 0x7f54797eb976)
unknown function (ip: 0x7f54797eb976)
ijl_toplevel_eval_in at /usr/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
unknown function (ip: 0x7f546035ff44)
unknown function (ip: 0x7f54603604b4)
unknown function (ip: 0x7f54603606fc)
unknown function (ip: 0x7f54603947e9)
unknown function (ip: 0x7f5460394dc3)
unknown function (ip: 0x7f5460394ddf)
unknown function (ip: 0x7f5460a8b6a9)
unknown function (ip: 0x7f5460a8b74b)
jl_f__call_latest at /usr/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
unknown function (ip: 0x7f5460ae1586)
unknown function (ip: 0x7f5460aeb7d7)
unknown function (ip: 0x7f5460aec478)
unknown function (ip: 0x7f5460aec5a8)
unknown function (ip: 0x7f54798156ef)
jl_repl_entrypoint at /usr/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
main at julia (unknown line)
unknown function (ip: 0x7f5479d802cf)
__libc_start_main at /usr/bin/../lib/libc.so.6 (unknown line)
_start at julia (unknown line)
Allocations: 68617344 (Pool: 68588115; Big: 29229); GC: 54
[1]    572598 illegal hardware instruction (core dumped)  julia
  • the bug happens in this high_symmetry_kpath function from DFTK. What I don't understand is that if I copy-paste inline exactly the content of this function (see this gist for the code to run), everything works fine.

  • the stacktrace let us think in a first time that this came from Brillouin.jl but this was discussed with the devs : Illegal instruction error thchr/Brillouin.jl#21 and we ended up with the conclusion that it was rather a Julia bug.

  • the same behavior is observed with the nightly version of julia, whose versioninfo() output is

Julia Version 1.9.0-DEV.1428
Commit 15f8fadb5b6 (2022-09-22 11:10 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Any insight on the origin of such a bug ? Thanks in advance for your help :-)

@KristofferC KristofferC added this to the 1.9 milestone Sep 23, 2022
@JeffBezanson JeffBezanson added the bug Indicates an unexpected problem or unintended behavior label Sep 23, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 23, 2022

Smaller MWE:

julia> typeintersect(DirectBasis, StaticArray{Tuple{3}, <:StaticArray{Tuple{3}, <:Real, 1}, 1})
Union{} # this should be DirectBasis{3}

julia> DirectBasis{3} <: StaticArray{Tuple{3}, <:StaticArray{Tuple{3}, <:Real, 1}, 1}
true

leading to this error

julia> methods(Brillouin.KPaths.extended_bravais, (Int64, String, DirectBasis, Any))
# 0 methods for generic function "extended_bravais" from Brillouin.KPaths

n.b.

julia> Base.show_supertypes(stdout, Base.unwrap_unionall(DirectBasis))
DirectBasis{D} <: AbstractBasis{D, Float64} <: StaticArray{Tuple{D}, SArray{Tuple{D}, Float64, 1, D}, 1} <: AbstractArray{SArray{Tuple{D}, Float64, 1, D}, 1} <: Any

@vtjnash vtjnash added the types and dispatch Types, subtyping and method dispatch label Sep 23, 2022
@gkemlin
Copy link
Author

gkemlin commented Sep 24, 2022

Thanks for the smaller MWE. Does it have anything to do with the DirectBasis type definition or this should definitely not happen ?

@N5N3
Copy link
Member

N5N3 commented Sep 24, 2022

Looks like we just forget to set tempe.intersection = 1

julia/src/subtype.c

Lines 2898 to 2901 in 24cb92d

jl_stenv_t tempe;
init_stenv(&tempe, env, envsz);
tempe.ignore_free = 1;
if (subtype_in_env(isuper, super_pattern, &tempe)) {

@dehann
Copy link
Contributor

dehann commented Nov 17, 2022

Hi, I'm still getting a similar error on Julia 1.8.3. Pseudo MWE below. The first line is the last printout to come from Julia-land before dropping down with the Signal (4) Illegal instruction:

WHAT IS GOING ON
Unreachable reached at 0x7f13965605ec

signal (4): Illegal instruction
in expression starting at REPL[10]:1
_writeG2oVertexes at /home/dehann/.julia/dev/RoME/src/services/g2oParser.jl:289
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
jl_apply at /home/dehann/software/julia/src/julia.h:1839 [inlined]
do_call at /home/dehann/software/julia/src/interpreter.c:126
eval_value at /home/dehann/software/julia/src/interpreter.c:215
eval_stmt_value at /home/dehann/software/julia/src/interpreter.c:166 [inlined]
eval_body at /home/dehann/software/julia/src/interpreter.c:612
jl_interpret_toplevel_thunk at /home/dehann/software/julia/src/interpreter.c:750
jl_toplevel_eval_flex at /home/dehann/software/julia/src/toplevel.c:906
jl_toplevel_eval_flex at /home/dehann/software/julia/src/toplevel.c:850
ijl_toplevel_eval_in at /home/dehann/software/julia/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /home/dehann/software/julia/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_63686 at /home/dehann/software/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
#967 at ./client.jl:419
jfptr_YY.967_57682 at /home/dehann/software/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
jl_apply at /home/dehann/software/julia/src/julia.h:1839 [inlined]
jl_f__call_latest at /home/dehann/software/julia/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_29970 at /home/dehann/software/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/dehann/software/julia/src/gf.c:2365 [inlined]
ijl_apply_generic at /home/dehann/software/julia/src/gf.c:2547
jl_apply at /home/dehann/software/julia/src/julia.h:1839 [inlined]
true_main at /home/dehann/software/julia/src/jlapi.c:575
jl_repl_entrypoint at /home/dehann/software/julia/src/jlapi.c:719
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at julia (unknown line)
Allocations: 166096316 (Pool: 166040379; Big: 55937); GC: 71
Illegal instruction (core dumped)

The code is a simple case of dispatch:

function _writeG2oLine(::Pose3, io, dfg::AbstractDFG, label, i, solveKey)
  println("WHAT IS GOING ON")
  return nothing
end

function _writeG2oVertexes(io,  dfg,  varIntLabel,  solveKey)
  for (label,i) in pairs(varIntLabel)
    vartype = getVariableType(dfg, label)
    _writeG2oLine(vartype, io, dfg, label, i, solveKey)
    println("NEVER SEEN")
  end
  return nothing
end

Notice how the inner function print statement runs, but then this big error on the return nothing statement. Execution never makes it to the later NEVER SEEN print line. I'm a little confused.


EDIT:

$ julia -O3
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.3 (2022-11-14)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 6 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 6

Also note Julia v1.8.3 I'm using here is freshly compiled from source. I was hoping 1.8.3 would fix the issue, but doesn't seem like it.


EDIT2: this was actually quite a tiring exercise, and ended up going with the following workaround:

# dispatching to a function like this does not work in Julia 1.8 in this case:
somefnc(::MyType, args...) = ...

# using workaround
fixdfnc = getfield(MyModule, Symbol(:somefnc, typeof(mytype).name.name))
fixdfnc(args...)

Also, not sure if this is related, but I found another dispatch issue on Julia 1.8: when trying to add a dispatch from a downstream module, the multiple dispatch breaks down in some cases (used to work before 1.8). For example, module First fnc(::MyType) end; and then overloading fails: module Second import First.fnc; First.fnc(::AnotherType) end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants