Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind warning on LLVM 3.6 and 3.7 #10806

Closed
garrison opened this issue Apr 14, 2015 · 21 comments
Closed

Valgrind warning on LLVM 3.6 and 3.7 #10806

garrison opened this issue Apr 14, 2015 · 21 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request
Milestone

Comments

@garrison
Copy link
Member

I put the following in Make.user

LLVM_VER = 3.6.1
CFLAGS = -DMEMDEBUG

then build and run under valgrind

make cleanall && make && valgrind --smc-check=all-non-file --suppressions=contrib/valgrind-julia.supp ./julia -e ""

It reports the following memory error:

==28214== Conditional jump or move depends on uninitialised value(s)
==28214==    at 0x532E973: llvm::DwarfCompileUnit::addRange(llvm::RangeSpan) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x5302D51: llvm::DwarfDebug::endFunction(llvm::MachineFunction const*) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x52EDF16: llvm::AsmPrinter::EmitFunctionBody() (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x5078CA2: llvm::X86AsmPrinter::runOnMachineFunction(llvm::MachineFunction&) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x5986A1E: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x5988E12: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x5354B6C: llvm::MCJIT::emitObject(llvm::Module*) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x53550C5: llvm::MCJIT::generateCodeForModule(llvm::Module*) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x53530B0: llvm::MCJIT::getSymbolAddress(std::string const&, bool) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x53531A8: llvm::MCJIT::getFunctionAddress(std::string const&) (in /home/garrison/julia/usr/lib/libjulia.so)
==28214==    by 0x4F7A282: jl_generate_fptr (codegen.cpp:734)
==28214==    by 0x4F4D24C: jl_trampoline_compile_function (builtins.c:998)
==28214==    by 0x4F4D24C: jl_trampoline (builtins.c:1009)

I have only been able to reproduce this with LLVM 3.6 and svn/3.7 (not 3.3 or 3.5). I have not tested with LLVM-svn.

@Keno
Copy link
Member

Keno commented Apr 14, 2015

Can you try with llvm svn? I have been doing a bunch of runs with memory sanitizer lately and this seems like the kind of thing it should have flagged, so I'd be interested to see if it's present in llvm-svn.

@garrison
Copy link
Member Author

Valgrind (version 3.10.0) itself keeps crashing for me on llvm-svn, with the error

valgrind: m_debuginfo/storage.c:535 (vgModuleLocal_addLineInfo): Assertion 'lineno >= 0' failed.

@garrison
Copy link
Member Author

Even nulgrind is failing. Here's a longer error report.

$ valgrind --tool=none --smc-check=all-non-file ./julia -e ""
==16466== Nulgrind, the minimal Valgrind tool
==16466== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote.
==16466== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==16466== Command: ./julia -e 
==16466== 

valgrind: m_debuginfo/storage.c:535 (vgModuleLocal_addLineInfo): Assertion 'lineno >= 0' failed.

host stacktrace:
==16466==    at 0x38100BBF: show_sched_status_wrk (m_libcassert.c:319)
==16466==    by 0x38100CB4: report_and_quit (m_libcassert.c:390)
==16466==    by 0x38100E36: vgPlain_assert_fail (m_libcassert.c:455)
==16466==    by 0x380A32BE: vgModuleLocal_addLineInfo (storage.c:535)
==16466==    by 0x3810E8EE: vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:279)
==16466==    by 0x3809E08A: vgModuleLocal_read_elf_debug_info (readelf.c:2921)
==16466==    by 0x38095910: vgPlain_di_notify_mmap (debuginfo.c:641)
==16466==    by 0x380BF6D5: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2228)
==16466==    by 0x380F2F68: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:630)
==16466==    by 0x380BBF75: vgPlain_client_syscall (syswrap-main.c:1586)
==16466==    by 0x380B87FA: handle_syscall (scheduler.c:1103)
==16466==    by 0x380B9EC6: vgPlain_scheduler (scheduler.c:1416)
==16466==    by 0x380C9800: run_a_thread_NORETURN (syswrap-linux.c:103)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==16466==    at 0x401731A: mmap (syscall-template.S:81)
==16466==    by 0x4006371: _dl_map_object_from_fd (dl-load.c:1344)
==16466==    by 0x400809E: _dl_map_object (dl-load.c:2605)
==16466==    by 0x4012A24: dl_open_worker (dl-open.c:235)
==16466==    by 0x400E8B3: _dl_catch_error (dl-error.c:187)
==16466==    by 0x401243A: _dl_open (dl-open.c:661)
==16466==    by 0x614702A: dlopen_doit (dlopen.c:66)
==16466==    by 0x400E8B3: _dl_catch_error (dl-error.c:187)
==16466==    by 0x61475DC: _dlerror_run (dlerror.c:163)
==16466==    by 0x61470C0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==16466==    by 0x4D8DBAF: jl_uv_dlopen (dlload.c:49)
==16466==    by 0x4D8DDEA: jl_load_dynamic_library_ (dlload.c:135)
==16466==    by 0x4D963C8: jl_preload_sysimg_so (dump.c:1443)
==16466==    by 0x4D8FF40: _julia_init (init.c:948)
==16466==    by 0x4D90B6C: julia_init (task.c:252)
==16466==    by 0x401372: main (repl.c:491)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

@garrison
Copy link
Member Author

Same thing on the latest Valgrind, version 3.10.1. I may file this as a bug against valgrind if others can reproduce.

@garrison
Copy link
Member Author

I modified the assertion in valgrind to issue a warning instead (in valgrind/coregrind/m_debuginfo/storage.c), and re-ran julia built against LLVM-svn. Every time the lineno is negative (and it happens hundreds of times during a run of julia -e ""), it is precisely equal to -1. Could it be that julia is reporting the line number incorrectly?

@Keno
Copy link
Member

Keno commented Apr 29, 2015

Yes, that's possible, though I seem to remember LLVM crashing when passed a line number < 0. Worth checking though.

@garrison garrison changed the title Valgrind warning on LLVM 3.6 Valgrind warning on LLVM 3.6 and 3.7 Apr 29, 2015
@garrison
Copy link
Member Author

In any event, I am able to reproduce the warning with LLVM-svn under a modified valgrind, so I updated the issue accordingly.

@garrison
Copy link
Member Author

I am able to reproduce with LLVM 3.6.1.

@jakebolewski
Copy link
Member

@garrison is this still a problem? Please comment and I'll reopen.

@garrison
Copy link
Member Author

Still an issue on LLVM 3.6.2. (Did you have a reason to believe it had been fixed?)

@garrison garrison reopened this Aug 12, 2015
@garrison
Copy link
Member Author

Also an issue on latest LLVM-svn (3.8). The negative lineno is still a problem there too.

@Keno
Copy link
Member

Keno commented Nov 23, 2015

Latest valgrind just prints a warning here.

@garrison
Copy link
Member Author

@Keno indeed the line number issue should be fixed, as I got my patch applied to valgrind upstream. But the original warning I reported above is still there, last I checked.

@JeffBezanson
Copy link
Member

Just ran into this.

@maleadt
Copy link
Member

maleadt commented Feb 1, 2016

Recent Julia code seems to be triggering another lineno >= 0 assertion in valgrind.
I've sent a patch to upstream.

@StefanKarpinski
Copy link
Member

Still relevant?

@StefanKarpinski StefanKarpinski added this to the 0.5.x milestone Sep 14, 2016
@garrison
Copy link
Member Author

These days I get a valgrind crash before it gets far enough to reproduce the above bug on julia master (I'm using commit d52a247):

$ valgrind --smc-check=all-non-file --suppressions=contrib/valgrind-julia.supp ./julia -e ""
==13741== Memcheck, a memory error detector
==13741== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13741== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13741== Command: ./julia -e 
==13741== 
==13741== Syscall param msync(start) points to unaddressable byte(s)
==13741==    at 0x5860BFD: ??? (syscall-template.S:84)
==13741==    by 0x4F693F9: msync_validate (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F6956D: validate_mem (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F696A8: access_mem (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F69F32: dwarf_get (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F6A306: _ULx86_64_access_reg (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F69397: _ULx86_64_get_reg (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F6D5E6: apply_reg_state (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F6DF07: _ULx86_64_dwarf_find_save_locs (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F6E71E: _ULx86_64_dwarf_step (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4F6A700: _ULx86_64_step (in /home/garrison/julia-master-memdebug/usr/lib/libjulia.so.0.6.0)
==13741==    by 0x4EB7698: jl_unw_step (stackwalk.c:331)
==13741==    by 0x4EB7698: jl_unw_stepn (stackwalk.c:47)
==13741==  Address 0xffeffe000 is on thread 1's stack
==13741==  1504 bytes below stack pointer
==13741== 

vex: the `impossible' happened:
   isZeroU
vex storage: T total 2003629960 bytes allocated
vex storage: P total 640 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==13741==    at 0x38083F48: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x38084064: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x380842A1: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x380842CA: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x3809F682: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x38148008: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x3815514D: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x38159272: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x38159EA6: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x3815BD68: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x3815CDB6: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x38145DEC: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x380A1C0B: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x380D296B: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x380D45CF: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==13741==    by 0x380E3946: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 13741)
==13741==    at 0x2FDDA4E0: ??? (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FDB96FF: EC_POINT_mul (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0xB40AD3B6B3B4CDFF: ???
==13741==    by 0xC678CCF: ???
==13741==    by 0x2FDC1E47: EC_KEY_check_key (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FDC2260: EC_KEY_set_public_key_affine_coordinates (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FE7B882: ??? (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FE7737F: ??? (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FE76A33: ??? (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FD4A70C: FIPS_mode_set (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x2FD46F89: OPENSSL_init_library (in /home/garrison/julia-master-memdebug/usr/lib/libcrypto.so.1.0.0)
==13741==    by 0x40104E9: call_init.part.0 (dl-init.c:72)
==13741==    by 0x40105FA: call_init (dl-init.c:30)
==13741==    by 0x40105FA: _dl_init (dl-init.c:120)
==13741==    by 0x4015711: dl_open_worker (dl-open.c:575)
==13741==    by 0x4010393: _dl_catch_error (dl-error.c:187)
==13741==    by 0x4014BD8: _dl_open (dl-open.c:660)
==13741==    by 0x5444F08: dlopen_doit (dlopen.c:66)
==13741==    by 0x4010393: _dl_catch_error (dl-error.c:187)
==13741==    by 0x5445570: _dlerror_run (dlerror.c:163)
==13741==    by 0x5444FA0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==13741==    by 0x4E8FE49: jl_load_dynamic_library_ (dlload.c:179)
==13741==    by 0x4EB5934: jl_get_library (runtime_ccall.cpp:158)
==13741==    by 0x4EB5A74: jl_load_and_lookup (runtime_ccall.cpp:169)
==13741==    by 0x97B3146: jlplt_git_libgit2_init_22662 (in /home/garrison/julia-master-memdebug/usr/lib/julia/sys.so)
==13741==    by 0x97B382A: julia___init___22660 (libgit2.jl:538)
==13741==    by 0x97B3968: jlcall___init___22660 (in /home/garrison/julia-master-memdebug/usr/lib/julia/sys.so)
==13741==    by 0x4E7601D: jl_call_method_internal (julia_internal.h:218)
==13741==    by 0x4E7601D: jl_apply_generic (gf.c:1852)
==13741==    by 0x4EA43A9: jl_apply (julia.h:1376)
==13741==    by 0x4EA43A9: jl_module_run_initializer (toplevel.c:83)
==13741==    by 0x4E92BB6: _julia_init (init.c:702)
==13741==    by 0x4E9328B: julia_init (task.c:289)
==13741==    by 0x4013DA: main (repl.c:247)

Thread 2: status = VgTs_WaitSys (lwpid 13742)
==13741==    at 0x5861196: do_sigwait (sigwait.c:64)
==13741==    by 0x5861196: sigwait (sigwait.c:96)
==13741==    by 0x4EC159D: signal_listener (signals-unix.c:513)
==13741==    by 0x58576F9: start_thread (pthread_create.c:333)
==13741==    by 0x5B73B5C: clone (clone.S:109)

Thread 3: status = VgTs_WaitSys (lwpid 13753)
==13741==    at 0x585D3A0: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==13741==    by 0x239E788A: blas_thread_server (in /home/garrison/julia-master-memdebug/usr/lib/libopenblas64_.so)
==13741==    by 0x58576F9: start_thread (pthread_create.c:333)
==13741==    by 0x5B73B5C: clone (clone.S:109)

Thread 4: status = VgTs_WaitSys (lwpid 13754)
==13741==    at 0x585D3A0: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==13741==    by 0x239E788A: blas_thread_server (in /home/garrison/julia-master-memdebug/usr/lib/libopenblas64_.so)
==13741==    by 0x58576F9: start_thread (pthread_create.c:333)
==13741==    by 0x5B73B5C: clone (clone.S:109)

Thread 5: status = VgTs_WaitSys (lwpid 13755)
==13741==    at 0x5B56D67: sched_yield (syscall-template.S:84)
==13741==    by 0x239E77A4: blas_thread_server (in /home/garrison/julia-master-memdebug/usr/lib/libopenblas64_.so)
==13741==    by 0x58576F9: start_thread (pthread_create.c:333)
==13741==    by 0x5B73B5C: clone (clone.S:109)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

@garrison
Copy link
Member Author

I get the same valgrind crash on the release-0.5 branch.

@garrison
Copy link
Member Author

The crash happens under nulgrind too:

$ valgrind --smc-check=all-non-file --suppressions=contrib/valgrind-julia.supp --tool=none ./julia -e ""
==15358== Nulgrind, the minimal Valgrind tool
==15358== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote.
==15358== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==15358== Command: ./julia -e 
==15358== 
==15358== brk segment overflow in thread #1: can't grow to 0x4a40000
==15358== brk segment overflow in thread #1: can't grow to 0x4a40000
==15358== brk segment overflow in thread #1: can't grow to 0x4a40000

and it continues to print that line repeatedly. It's unclear to me whether this represents a julia bug. Probably worth bisecting to figure out where it started though.

@garrison
Copy link
Member Author

OK, I managed to get valgrind and julia to work together on a different computer (not sure what was making it crash on my other machine). The original bug (valgrind warning in llvm::DwarfCompileUnit::addRange(llvm::RangeSpan)) does indeed still exist on the latest julia master.

@StefanKarpinski StefanKarpinski added help wanted Indicates that a maintainer wants help on an issue or pull request and removed help wanted Indicates that a maintainer wants help on an issue or pull request labels Oct 27, 2016
@JeffBezanson
Copy link
Member

Closing as we are now on LLVM 3.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

6 participants