Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler segmentation fault #28648

Closed
sverek opened this issue Aug 14, 2018 · 37 comments · Fixed by #30369
Closed

Profiler segmentation fault #28648

sverek opened this issue Aug 14, 2018 · 37 comments · Fixed by #30369
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@sverek
Copy link

sverek commented Aug 14, 2018

When running the profiler in Julia 1.0.0, the REPL crashes when running Profile.print()

Also tried with Juno profiler. Crashes REPL while profiling.

Julia 1.0.0 prebuilt binaries.
macOS 10.13.6

Example:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.0 (2018-08-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Profile

julia> include("qpfem.jl")
gendL (generic function with 1 method)

julia> runexperiment4()
prec=256 p=2 n=2 errK=6.908935E-77 errM=8.636169E-78
prec=256 p=3 n=2 errK=5.527148E-76 errM=2.159042E-77
prec=256 p=4 n=2 errK=2.210859E-75 errM=8.636169E-78
prec=256 p=5 n=2 errK=7.738007E-75 errM=8.636169E-78
prec=256 p=6 n=2 errK=1.768687E-74 errM=4.318084E-77

julia> @profile runexperiment4()
prec=256 p=2 n=2 errK=6.908935E-77 errM=8.636169E-78
prec=256 p=3 n=2 errK=5.527148E-76 errM=2.159042E-77
prec=256 p=4 n=2 errK=2.210859E-75 errM=8.636169E-78
prec=256 p=5 n=2 errK=7.738007E-75 errM=8.636169E-78
prec=256 p=6 n=2 errK=1.768687E-74 errM=4.318084E-77

julia> Profile.print()
Segmentation fault: 11

https://gist.github.com/sverek/107e64a21eed660b273d0fd2f5d366e3

@sverek
Copy link
Author

sverek commented Aug 14, 2018

Just a comment, profiler works fine for other functions, for example Basic Usage example in https://docs.julialang.org/en/v0.6.2/manual/profile/

@JeffBezanson
Copy link
Sponsor Member

Tried this on linux and it works.

@andreasnoack
Copy link
Member

I can reproduce on Mac

@sverek
Copy link
Author

sverek commented Aug 15, 2018

Same thing on linux:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)

julia> Profile.print()
signal (11): Segmentation fault
in expression starting at no file:0
sig_match_simple at /buildworker/worker/package_linux64/build/src/typemap.c:125 [inlined]
jl_typemap_entry_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:780
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:883 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:833
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:886 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:833
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:886 [inlined]
jl_lookup_generic_ at /buildworker/worker/package_linux64/build/src/gf.c:2133 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2179
lookup at ./stacktraces.jl:114
lookup at ./stacktraces.jl:119 [inlined]
#6 at ./none:0
iterate at ./generator.jl:47 [inlined]
Type at ./dict.jl:104
print at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Profile/src/Profile.jl:182
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1829
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:428
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:686
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:799
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x2b87bc69db3f)
unknown function (ip: (nil))
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:787
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:622
eval at ./boot.jl:319
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:259
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1536 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:268
unknown function (ip: 0xffffffffffffffff)
Allocations: 43717082 (Pool: 43708835; Big: 8247); GC: 92
Segmentation fault (core dumped)





$ uname -a
Linux xxx 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux```


@sverek
Copy link
Author

sverek commented Aug 15, 2018

No segfault on Windows, profiler works:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

@jstrube
Copy link

jstrube commented Aug 16, 2018

Not sure if this is related:
I also see a segfault with Profile.print() https://discourse.julialang.org/t/profile-segfault/13484
In my case, I'm on Linux, but I'm calling into an external Library (using LCIO.jl) as data source.
I haven't found a minimal working example to reproduce, yet. Trying to replace the 700 MB data source with rand() values does not reproduce the crash.
Trying to run over a subset of the file also doesn't reproduce the crash...

@mkborregaard
Copy link
Contributor

I have my Julia session randomly being silently killed like 50% of the times when running Profile.print() or Juno's profiler. I don't get the Segmentation fault error. I'm on Mac. Anecdotally, I have only observed this in Juno.

@maxbennedich
Copy link
Contributor

I am seeing the same thing, with both Julia 0.7 and Julia 1.0, on Mac. Not using Juno. Never had any issues with Julia 0.6.4.

Specifying a smaller number of instruction pointers using Profile.init, I can usually avoid the segfault. But that often means that I'm limited to profiling for 1-2 seconds.

@simonbyrne
Copy link
Contributor

simonbyrne commented Sep 7, 2018

This example managers to trigger it on both Mac and Linux on 1.0

using Profile

function foo(n)
    x = 0.0
    for i = 1:n
        x += 0.001*randn()
    end
    x
end

foo(100)
@profile foo(1000_000_000)
Profile.print()

@maxbennedich
Copy link
Contributor

The above example fails for me too (Mac, 1.0). But if you put Profile.init(n=75_000) before the @profile, it works for me, and seems able to profile the entire run.

I got the same stack trace that @sverek posted above.

@JeffBezanson JeffBezanson added this to the 1.0.x milestone Sep 7, 2018
@JeffBezanson JeffBezanson added the bug Indicates an unexpected problem or unintended behavior label Sep 7, 2018
@JeffBezanson
Copy link
Sponsor Member

Can somebody who can reproduce this try running it with --check-bounds=yes?

@sverek
Copy link
Author

sverek commented Sep 7, 2018

Started julia 1.0 on macOS with --check-bounds=yes

and ran code by simonbyrne and it segfaults REPL in the same way as without check-bounds

julia> include("foo.jl")
foo (generic function with 1 method)

julia> foo(100)
0.014651356344728163

julia> @profile foo(1000_000_000)
-24.68816112652172

julia> Profile.print()
Segmentation fault: 11

@tk3369
Copy link
Contributor

tk3369 commented Sep 15, 2018

I just encountered the same issue on Mac although there's an extra warning. Not sure if it's the same bug.

julia> using Profile

julia> @profile for i in 1:1000000 parse(Float64, "1.$i") end;

julia> Profile.print()
┌ Warning: The profile data buffer is full; profiling probably terminated
│ before your program finished. To profile for longer runs, call
│ `Profile.init()` with a larger buffer and/or larger delay.
└ @ Profile /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Profile/src/Profile.jl:312
Segmentation fault: 11

@maxbennedich
Copy link
Contributor

I can reproduce this on Windows too, on Julia 1.0, with the example above. Same stack trace as posted above.

@RalphAS
Copy link

RalphAS commented Sep 17, 2018

Perhaps this will help to find the cause:

Some of the pointers returned by Profile.fetch() are problematic. In a few cases, they differ by one from other pointers in the list (which behave well). For example, lookup for one pointer ip=0x00007fb5617cba51 returns

  svec(:kwfunc, Symbol("./boot.jl"), 321, MethodInstance for kwfunc(::Any), false, false, Ptr{Nothing} @0x00007fb5617cba50)

but pointer ip=0x00007fb5617cba50 is also in the list, and leads to a segfault when lookup passes ip-1 to jl_lookup_code_address.

Ignoring these "off-by-one" addresses seems to yield plausible results.

@maxbennedich
Copy link
Contributor

Hmm, I can't reproduce that using Simons example above (on a Mac on Julia 1.0).

While some pointers are problematic and segfault when lookuped, it's not the off-by-one pointers that are problematic for me, and I don't see a way to determine if a pointer will trigger a segfault without actually looking it up.

@timholy
Copy link
Sponsor Member

timholy commented Sep 17, 2018

I cannot reproduce the failures above. @RalphAS 's observation makes it seem likely that this is a libunwind problem. There are some interesting observations in libunwind's README. It's also worth noting that several PRs have been committed to patch or work around libunwind failures (e.g., #28291, #24379, #4159, #24023, probably more).

For starters we should have folks report a fair amount architecture detail:

Julia build type: from source

julia> versioninfo()
Julia Version 1.0.1-pre.139
Commit 9ee3f881b3* (2018-09-12 15:03 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
  JULIAFUNCDIR = /home/tim/juliafunc
  JULIA_CPU_THREADS = 2

(I'm on the branch for #28764), and since I'm on linux:

tim@diva:~$ ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27

tim@diva:~$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Stepping:            4
CPU MHz:             1009.495
CPU max MHz:         3000.0000
CPU min MHz:         500.0000
BogoMIPS:            4788.76
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            4096K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts flush_l1d

Is it worth having people run libunwind's tests and report the results? I navigated to JULIAHOME/deps/srccache/libunwind-1.1-julia2 and got this:

$ ./configure
# suppressed output

$ make check
# lots of build output, then
PASS: test-proc-info
PASS: test-static-link
PASS: test-strerror
PASS: Gtest-bt
PASS: Ltest-bt
PASS: Gtest-exc
PASS: Ltest-exc
PASS: Gtest-init
PASS: Ltest-init
PASS: Gtest-concurrent
PASS: Ltest-concurrent
../config/test-driver: line 107:  7003 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: Gtest-resume-sig
../config/test-driver: line 107:  7024 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: Ltest-resume-sig
../config/test-driver: line 107:  7044 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: Gtest-resume-sig-rt
../config/test-driver: line 107:  7064 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: Ltest-resume-sig-rt
XFAIL: Gtest-dyn1
XFAIL: Ltest-dyn1
PASS: Gtest-trace
PASS: Ltest-trace
PASS: test-async-sig
PASS: test-flush-cache
PASS: test-init-remote
PASS: test-mem
PASS: Ltest-varargs
PASS: Ltest-nomalloc
PASS: Ltest-nocalloc
PASS: Lrs-race
PASS: test-ptrace
PASS: test-setjmp
PASS: run-check-namespace
PASS: run-ptrace-mapper
PASS: run-ptrace-misc
PASS: run-coredump-unwind
============================================================================
Testsuite summary for libunwind 1.1
============================================================================
# TOTAL: 33
# PASS:  27
# SKIP:  0
# XFAIL: 2
# FAIL:  4
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
Please report to libunwind-devel@nongnu.org
============================================================================

@cstjean
Copy link
Contributor

cstjean commented Sep 17, 2018

I get the segfault all the time too, on Ubuntu, Julia 1.0.0. It's triggered by @simonbyrne 's code above.

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

ldd (Ubuntu GLIBC 2.23-0ubuntu10) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
cst-jean@magneto:~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
Stepping:              2
CPU MHz:               3300.386
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              6599.83
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm invpcid_single ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts

@timholy
Copy link
Sponsor Member

timholy commented Sep 17, 2018

Is that a source build or a downloaded binary?

@timholy
Copy link
Sponsor Member

timholy commented Sep 17, 2018

Oooh, interesting: I just tried a downloaded julia binary and got the segfault. But my source-build is fine. Is everyone who is experiencing this using a binary?

versioninfo for the binary:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
  JULIAFUNCDIR = /home/tim/juliafunc
  JULIA_CPU_THREADS = 2

@cstjean
Copy link
Contributor

cstjean commented Sep 17, 2018

Downloaded binary.

@sverek
Copy link
Author

sverek commented Sep 17, 2018

Downloaded binary (mac, linux, windows).

@KristofferC
Copy link
Sponsor Member

Would be interesting to check if nightly has the same problem.

@andreasnoack
Copy link
Member

I can reproduce this on nightly

Julia Version 1.1.0-DEV.271
Commit 16516b5fbf (2018-09-17 12:51 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

@sverek
Copy link
Author

sverek commented Sep 17, 2018

Crashes on nightly macOS for me too

julia> versioninfo()
Julia Version 1.1.0-DEV.271
Commit 16516b5fbf (2018-09-17 12:51 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) M-5Y71 CPU @ 1.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

@mkborregaard
Copy link
Contributor

I'm on a downloaded binary (1.0.0, commit 5d4eaca, macOS)

@andreasnoack
Copy link
Member

This seems to be related to more recent architectures. I haven't been able to reproduce this if I set --cpu-target=nehalem (or simply try it on anubis).

@RalphAS
Copy link

RalphAS commented Sep 19, 2018

My experiments described above (with consistent failures on sufficiently large profiling runs) were on a downloaded binary 1.0.0, Linux x86_64 (Haswell).

On an installation locally built from source 1.1.0-DEV.281, same system, I'm not seeing any segfaults (or off-by-one pointers) so far. I do still see a few "stragglers", which are incorrectly printed outside of the tree.

@sverek
Copy link
Author

sverek commented Sep 19, 2018

Built julia from source, then no crash with example by @simonbyrne

julia> versioninfo()
Julia Version 1.1.0-DEV.281
Commit 8dd33262d3 (2018-09-18 17:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.7.0)
  CPU: Intel(R) Core(TM) M-5Y71 CPU @ 1.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

@timholy
Copy link
Sponsor Member

timholy commented Sep 24, 2018

Seems pretty clearly related to static binaries, but dependent on architecture. CC @staticfloat in case he hasn't seen this.

@staticfloat
Copy link
Sponsor Member

staticfloat commented Sep 24, 2018

I can also trigger this reliably using the official 1.0.0 binary. Interestingly, if I put a ; after Profile.print(), it doesn't trigger, nor does it trigger if the commands are saved into a script. I have to directly print the return value from Profile.print(), which seems kind of weird to me, but whatever.

Secondly, I believe this is due to the sysimg multiversioning we do on the buildbots. I can reproduce the segfault locally if I set the makevars JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)" and MARCH=x86-64. Note that getting rid of the haswell multiversion causes the bug to no longer trigger, so TimAndreas's guess that this has to do with newer CPUs seems right on the money.

My from-source builds are from the latest master, commit 99e0b3b. My hardware:

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz
Stepping:            9
CPU MHz:             3800.682
CPU max MHz:         3900.0000
CPU min MHz:         800.0000
BogoMIPS:            7010.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

@staticfloat
Copy link
Sponsor Member

Note: I updated the above to note that MARCH=x86-64 is also needed to trigger the bug locally.

@mkborregaard
Copy link
Contributor

Yes I'm on a Broadwell processor

@vancleve
Copy link

I have my Julia session randomly being silently killed like 50% of the times when running Profile.print() or Juno's profiler. I don't get the Segmentation fault error. I'm on Mac. Anecdotally, I have only observed this in Juno.

I see this too on Juno on a Mac with Skylake.

@BenjaminBorn
Copy link
Contributor

Bump. Is there any hope of fixing this?

@vtjnash vtjnash self-assigned this Dec 12, 2018
vtjnash added a commit that referenced this issue Dec 12, 2018
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648
vtjnash added a commit that referenced this issue Dec 12, 2018
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648
vtjnash added a commit that referenced this issue Dec 17, 2018
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648
@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 17, 2018

Nice collaborative work building the reduction folks. Thanks!

KristofferC pushed a commit that referenced this issue Dec 20, 2018
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648

(cherry picked from commit e51a707)
@x-ji
Copy link

x-ji commented Dec 23, 2018

I'm also seeing this on Linux with v1.0.3 binary (Skylake CPU). Putting a ; after Profile.print() doesn't seem to work either.

I'm not sure if I'm facing the same error as those posted above. I didn't have the error a while ago.

A workaround for me was to use ProfileView.jl and run ProfileView.view(), which seems to work normally.

Not sure if the following stacktrace helps:

julia> Profile.print()
┌ Warning: The profile data buffer is full; profiling probably terminated
│ before your program finished. To profile for longer runs, call
│ `Profile.init()` with a larger buffer and/or larger delay.
└ @ Profile /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Profile/src/Profile.jl:312

signal (11): Segmentation fault
in expression starting at no file:0
sig_match_simple at /buildworker/worker/package_linux64/build/src/typemap.c:125 [inlined]
jl_typemap_entry_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:780
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:883 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:833
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:886 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:833
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:886 [inlined]
jl_lookup_generic_ at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2181
lookup at ./stacktraces.jl:114
lookup at ./stacktraces.jl:119 [inlined]
#6 at ./none:0
iterate at ./generator.jl:47 [inlined]
Type at ./dict.jl:104
print at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Profile/src/Profile.jl:182
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:430
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:682
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:806
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f756366a0df)
unknown function (ip: (nil))
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:815
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:805
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:622
eval at ./boot.jl:319
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:259
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1537 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:268
unknown function (ip: 0xffffffffffffffff)
Allocations: 1117050045 (Pool: 1116983863; Big: 66182); GC: 1016
[1]    24657 segmentation fault (core dumped)  julia -O3

KristofferC pushed a commit that referenced this issue Dec 30, 2018
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648

(cherry picked from commit e51a707)
KristofferC pushed a commit that referenced this issue Feb 4, 2019
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648

(cherry picked from commit e51a707)
KristofferC pushed a commit that referenced this issue Feb 11, 2019
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648

(cherry picked from commit e51a707)
KristofferC pushed a commit that referenced this issue Apr 20, 2019
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648

(cherry picked from commit e51a707)
KristofferC pushed a commit that referenced this issue Feb 20, 2020
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list
that do not correspond to an actual method (such as jlplt thunks).

Also some code cleanup for clarity.

fix #28648

(cherry picked from commit e51a707)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.