Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with jl_init__threading with julia 1.6 on macOS #40246

Closed
Non-Contradiction opened this issue Mar 28, 2021 · 25 comments
Closed

Segfault with jl_init__threading with julia 1.6 on macOS #40246

Non-Contradiction opened this issue Mar 28, 2021 · 25 comments
Assignees

Comments

@Non-Contradiction
Copy link

On macOS, if we load the julia 1.6 dynamical library and segfault will happen at the jl_init__threading function. The same segfault doesn't happen with julia 1.5.

For example, with julia 1.6 in R

dyn.load("/Applications/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.1.6.dylib")
.C("jl_init__threading")

which causes the following segfault:

*** caught segfault ***
address 0xfffffffffffffff8, cause 'memory not mapped'

Traceback:
1: .C("jl_init__threading")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
ERROR: Selection:

And the same code runs fine with julia 1.5 in R

dyn.load("/Applications/Julia-1.5.app/Contents/Resources/julia/lib/libjulia.1.5.dylib")
.C("jl_init__threading")

With julia 1.6 in python3

MacBookdeMacBook-Pro:~ macbookpro$ python3
Python 3.8.2 (default, Dec 21 2020, 15:06:04) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> libjulia = ctypes.PyDLL("/Applications/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.1.6.dylib", ctypes.RTLD_GLOBAL)
>>> libjulia.jl_init__threading()
ERROR: Segmentation fault: 11
MacBookdeMacBook-Pro:~ macbookpro$ 

And the same code runs fine with julia 1.5 in python3

Python 3.8.2 (default, Dec 21 2020, 15:06:04) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> libjulia = ctypes.PyDLL("/Applications/Julia-1.5.app/Contents/Resources/julia/lib/libjulia.1.5.dylib", ctypes.RTLD_GLOBAL)    
>>> libjulia.jl_init__threading()
775
>>> exit()

Discourse post at https://discourse.julialang.org/t/segfault-when-jl-init-in-r-also-python-with-julia-1-6-on-macos/58110/3.
Relevant issue at JuliaInterop/JuliaCall#164.

@TheCedarPrince
Copy link
Member

I can confirm this being an issue on Mac OS as I had the same error occur.

@JackDunnNZ
Copy link
Contributor

@staticfloat Apologies for the direct ping, but do you know what might be going on here? It looks like jl_init__threading is segfaulting for 1.6 on macOS only.

@staticfloat
Copy link
Member

Can you check with the latest master; I think I may have just fixed this as a part of ac7974a

@Non-Contradiction
Copy link
Author

Non-Contradiction commented Apr 24, 2021

Thanks @staticfloat . I tried the binary from "https://s3.amazonaws.com/julialangnightlies/pretesting/mac/x64/1.7/julia-ac7974acef-mac64.dmg", the old error seems getting fixed, but new error appears which is still some missing symbol and segfault:

jl_flush_cstdio - dlsym(0x7fc548d88110, jl_flush_cstdio): symbol not found

@JackDunnNZ
Copy link
Contributor

Thanks @staticfloat for the help!

I built from master and see the same as @Non-Contradiction with JuliaCall from R:

> JuliaCall::julia_setup()
Julia version 1.7.0-DEV.998 at location /private/tmp/julia/julia/usr/bin will be used.
Error in juliacall_initialize(.julia$dll_file, .julia$bin_dir, relative_sysimage_path) :
  jl_flush_cstdio - dlsym(0x7f809066f520, jl_flush_cstdio): symbol not found

which if it helps points to here:

https://github.com/Non-Contradiction/JuliaCall/blob/4777b771303bf7683a60f1cb75a88b5c3a53ecea/src/libjulia.cpp#L133

On Python/pyjulia, I see a similar error with a different symbol:

>>> from julia import Julia
>>> Julia(runtime='/tmp/julia/julia/julia')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/site-packages/julia/core.py", line 680, in __init__
    self.__julia = Julia(*args, **kwargs)
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/site-packages/julia/core.py", line 472, in __init__
    self.api = LibJulia.from_juliainfo(jlinfo)
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/site-packages/julia/libjulia.py", line 195, in from_juliainfo
    sysimage=juliainfo.sysimage,
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/site-packages/julia/libjulia.py", line 217, in __init__
    setup_libjulia(self.libjulia)
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/site-packages/julia/libjulia.py", line 34, in setup_libjulia
    libjulia.jl_.argtypes = [c_void_p]
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/Users/jack/.pyenv/versions/3.6.6/lib/python3.6/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7f8d8c4471c0, jl_): symbol not found

which points to here:

https://github.com/JuliaPy/pyjulia/blob/87c669e2729f9743fe2ab39320ec9b91c9300a96/src/julia/libjulia.py#L34

In both cases it looks like the error is hit before jl_init__xxx is called, so I can't say if the original problem is fixed

I also tried running the commands from @Non-Contradiction in the OP to load the image directly:

> dyn.load("/tmp/julia/julia/usr/lib/julia/sys.dylib")
Error in dyn.load("/tmp/julia/julia/usr/lib/julia/sys.dylib") :
  unable to load shared object '/tmp/julia/julia/usr/lib/julia/sys.dylib':
  dlopen(/tmp/julia/julia/usr/lib/julia/sys.dylib, 6): Library not loaded: @rpath/libjulia-internal.dylib
  Referenced from: /private/tmp/julia/julia/usr/lib/julia/sys.dylib
  Reason: image not found
>>> import ctypes
>>> libjulia = ctypes.PyDLL("/tmp/julia/julia/usr/lib/julia/sys.dylib", ctypes.RTLD_GLOBAL)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/tmp/julia/julia/usr/lib/julia/sys.dylib, 10): Library not loaded: @rpath/libjulia-internal.dylib
  Referenced from: /private/tmp/julia/julia/usr/lib/julia/sys.dylib
  Reason: image not found

I'm not sure if I am doing something wrong here though after the libjulia-internal change?

@staticfloat
Copy link
Member

The jl_ undefined error is not because of the libjulia-internal change, it's because jl_ was un-exported in 67f29d8#diff-cf6e417d01a6e25b814df855f594119c5d96891b39091abc802a5b6b04a86a56L2

Is it possible to no longer use that symbol in pyjulia? If not, you may need to ask @vchuravy what to use instead.

You should not be loading sys.dylib directly and you shouldn't need to load libjulia-internal; you should only ever need to be loading libjulia.dylib.

@staticfloat
Copy link
Member

It looks like the jl_flush_cstdio export was removed as well, which explains the other error.

@vtjnash
Copy link
Member

vtjnash commented Apr 29, 2021

It seems like neither package uses the symbol, they just check for the existence of it?

@JackDunnNZ
Copy link
Contributor

Thanks both for the help! Unfortunately I don't know much about the internals of either package, but hopefully this helps @Non-Contradiction and @tkf

@Non-Contradiction
Copy link
Author

@staticfloat @vtjnash Thanks for the help!
Yes, the jl_flush_cstdio symbol is no longer used. After removing the symbol checking for jl_flush_cstdio, JuliaCall does not complain about missing symbol anymore, but still segfault at jl_init__threading, which seems to be the same segfault if you load the libjulia dynamic library and call the function directly as in the original question.

@staticfloat
Copy link
Member

Can you try invoking jl_init() instead of jl_init__threading()? It appears to me that this area is in flux a bit, as c556bb9 recently eliminated the jl_init__threading() export completely, so I assume that jl_init() is the "correct" entrypoint. Jameson, Jeff, feel free to contradict me if that's not the case.

@jacobxk
Copy link

jacobxk commented Apr 30, 2021

same with this issue, and I have tried to reinstall Julia 1.6.1 and RCall, but still.

@fingolfin
Copy link
Member

Note that c556bb9 broke ABI compatibility: any binary compiled against Julia <= 1.6 and calling jl_init() won't work with latest Julia master / Julia 1.7 unless it is recompiled.

If that's truly intentional, I guess we'll have to keep up producing multiple binaries for lots of BinaryBuilder JLLs, one for each major Julia version (and in some cases maybe more, as in the best there were ABI breaks even in patch level updates from 1.x.y to 1.x.(y+1)).

However, this migration is tricky, because there won't be a libjulia_jll 1.7 before Julia 1.7 is released, so I can't provide such JLLs linked against 1.7/master until then. This would be less of an issue if there was a clear strategy for such breaking changes in Julia: like, leaving a compatibility stub for jl_init__threading in 1.7 so that binaries compiled against older Julia version keep working, and removing that only in 1.8.

@staticfloat
Copy link
Member

However, this migration is tricky, because there won't be a libjulia_jll 1.7 before Julia 1.7 is released, so I can't provide such JLLs linked against 1.7/master until then.

We can certainly produce a Julia v1.7-compatible libjulia_jll; it just may not be compatible with the final Julia v1.7.

like, leaving a compatibility stub for jl_init__threading in 1.7 so that binaries compiled against older Julia version keep working, and removing that only in 1.8.

This seems reasonable to me. @JeffBezanson any thoughts?

@TheCedarPrince
Copy link
Member

TheCedarPrince commented Jun 17, 2021

Just a small bump on this issue - any updates here? Thank you!

P.S. By way of update, I tried this on Julia 1.6.1 and here is the error message now:

> julia <- julia_setup(JULIA_HOME = '/Applications/Julia-1.6.app/Contents/Resources/julia/bin/')
Julia version 1.6.1 at location /Applications/Julia-1.6.app/Contents/Resources/julia/bin will be used.
ERROR: Unable to load dependent library /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib
Message:dlopen(/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib, 10): Symbol not found: _jl_vararg_typename
  Referenced from: /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib
  Expected in: /Users/jzelko3/FOSS/julia/usr/lib/libjulia.1.8.dylib
 in /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib

Not sure what this means. Thanks all!

@TheCedarPrince
Copy link
Member

Additional update, I ran the following potential fix: export R_LD_LIBRARY_PATH='/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia' to fix Julia 1.6.1 and am now back to the same error as before on 1.6.1:

> library('JuliaCall')
> julia <- julia_setup('/Applications/Julia-1.6.app/Contents/Resources/julia/bin/')
Julia version 1.6.1 at location /Applications/Julia-1.6.app/Contents/Resources/julia/bin will be used.
ERROR:
 *** caught segfault ***
address 0xfffffffffffffff8, cause 'memory not mapped'

Traceback:
 1: juliacall_initialize(.julia$dll_file, .julia$bin_dir, relative_sysimage_path)
 2: julia_setup("/Applications/Julia-1.6.app/Contents/Resources/julia/bin/")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 3

@fchorney
Copy link
Contributor

So I've been doing some testing trying to get macOS 11.4, Python 3.9, and Julia 1.6 working with pyjulia 0.5.6, and I think I found a way to get it to work somewhat.

Some notes:

  • I am using the Julia 1.6 macOS binary downloaded from julialang.org which exists in /Applications/Julia-1.6.app.
  • I have symlinked the julia binary to my /usr/local/bin like so ln -s /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia /usr/local/bin/julia1.6 and then whenever I change my default julia I symlink it again like so: ln -s /usr/local/bin/julia1.6 /usr/local/bin/julia
  • I am doing this in a python 3.9 venv

So my first few observations.

ERROR: could not load library "libjulia.1.dylib"
dlopen(libjulia.1.dylib, 1): image not found
  • libjulia.1.dylib exists in /Applications/Julia-1.6.app/Contents/Resources/julia/lib but it seems to be looking in /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia instead, so I symlinked libjulia.1.dylib into the lib/julia folder and it stopped segfaulting.
  • This seems weird to me since 1.5 didn't have that file in that folder either, so I'm not sure why it's looking for it there now.
  • Then I saw an error that it wouldn't find libjulia.dylib which I just applied the same symlink fix as above and then I had no issues

So, as far as I can tell, the issue is that jl_ isn't exported anymore and pyjulia just checks for its existence, and also julia is looking for libjulia.1.dylib and libjulia.dylib in the wrong place?

Hoping this info might help in some way!

@ihnorton
Copy link
Member

ihnorton commented Sep 2, 2021

The segfault happens as a side-effect of trying to print the error message (ERROR: could not load library "libjulia.1.dylib")

in the second call to jl_load_dynamic_library within julia_init.

Running

import ctypes
libjl = ctypes.CDLL("/opt/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.1.6.dylib", mode=ctypes.RTLD_GLOBAL)
libjl.jl_init__threading()

I get

(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 89292 exited with status = 9 (0x00000009)
Process 89314 launched: '/cmn/condaenvs/tiledb38/bin/python' (x86_64)
Process 89314 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
    frame #0: 0x00000001030a2710 libjulia-internal.1.dylib`jl_load_dynamic_library
libjulia-internal.1.dylib`jl_load_dynamic_library:
->  0x1030a2710 <+0>: pushq  %rbp
    0x1030a2711 <+1>: movq   %rsp, %rbp
    0x1030a2714 <+4>: pushq  %r15
    0x1030a2716 <+6>: pushq  %r14
Target 0: (python) stopped.
(lldb) c
Process 89314 resuming
Process 89314 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
    frame #0: 0x00000001030a2710 libjulia-internal.1.dylib`jl_load_dynamic_library
libjulia-internal.1.dylib`jl_load_dynamic_library:
->  0x1030a2710 <+0>: pushq  %rbp
    0x1030a2711 <+1>: movq   %rsp, %rbp
    0x1030a2714 <+4>: pushq  %r15
    0x1030a2716 <+6>: pushq  %r14
Target 0: (python) stopped.
(lldb) reg read rdi
     rdi = 0x00000001031b4d51  "libjulia.1.dylib"
(lldb) c
Process 89314 resuming
ERROR: Process 89314 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8)
    frame #0: 0x00000001030bdf8d libjulia-internal.1.dylib`jl_uv_puts + 749
libjulia-internal.1.dylib`jl_uv_puts:
->  0x1030bdf8d <+749>: movq   -0x8(%rax), %rax
    0x1030bdf91 <+753>: testl  %ebx, %ebx
    0x1030bdf93 <+755>: jns    0x1030bdf2c               ; <+652>
    0x1030bdf95 <+757>: movq   %r14, %rdi
Target 0: (python) stopped.
(lldb) bt 10
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8)
  * frame #0: 0x00000001030bdf8d libjulia-internal.1.dylib`jl_uv_puts + 749
    frame #1: 0x00000001030be1b4 libjulia-internal.1.dylib`jl_printf + 212
    frame #2: 0x0000000103007294 libjulia-internal.1.dylib`jl_vexceptionf + 244
    frame #3: 0x0000000103007352 libjulia-internal.1.dylib`jl_errorf + 146
    frame #4: 0x00000001030a2bac libjulia-internal.1.dylib`jl_load_dynamic_library + 1180
    frame #5: 0x00000001030a40f5 libjulia-internal.1.dylib`_julia_init + 405
    frame #6: 0x00000001030d9e9c libjulia-internal.1.dylib`jl_init__threading + 124
    frame #7: 0x00000001005fc934 libffi.6.dylib`ffi_call_unix64 + 76
    frame #8: 0x00000001005fc1ba libffi.6.dylib`ffi_call + 842
    frame #9: 0x0000000101df06b0 _ctypes.cpython-38-darwin.so`_ctypes_callproc + 480

I believe the problem boils down to this: julia is effectively calling jl_dlopen("libjulia.1.dylib") from libjulia-internal.1.dylib, which does not have any RPATH to be able to find that library:

otool -l /opt/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.dylib.bk | grep LC_ -A2
...
--
          cmd LC_RPATH
      cmdsize 32
         path @loader_path/julia/ (offset 12)
--
          cmd LC_RPATH
      cmdsize 32
         path @loader_path/ (offset 12)

This explains why creating the symlink for libjulia.1.dylib under lib/julia/ (or setting DYLD_LIBRARY_PATH=/opt/Julia-1.6.app/Contents/Resources/julia/lib/) alleviates the segfault.

@vtjnash
Copy link
Member

vtjnash commented Sep 2, 2021

$ grep -RI JL_LIBJULIA_SONAME src/
src//init.c:    jl_libjulia_handle = jl_load_dynamic_library(JL_LIBJULIA_SONAME, JL_RTLD_DEFAULT, 1);
src//Makefile:SHIPFLAGS  += "-DJL_LIBJULIA_SONAME=\"libjulia.$(JL_MAJOR_SHLIB_EXT)\""       "-DJL_LIBJULIA_INTERNAL_SONAME=\"libjulia-internal.$(JL_MAJOR_SHLIB_EXT)\""
src//Makefile:DEBUGFLAGS += "-DJL_LIBJULIA_SONAME=\"libjulia-debug.$(JL_MAJOR_SHLIB_EXT)\"" "-DJL_LIBJULIA_INTERNAL_SONAME=\"libjulia-internal-debug.$(JL_MAJOR_SHLIB_EXT)\""

hm, someone does seem to be lying, since that is not the SONAME for LIBJULIA_INTERNAL:

$ otool -L usr/lib/libjulia-internal.dylib 
usr/lib/libjulia-internal.dylib:
	@rpath/libjulia-internal.dylib (compatibility version 1.0.0, current version 1.8.0)
	@rpath/libjulia.dylib (compatibility version 1.0.0, current version 1.8.0)
	@rpath/libunwind.1.dylib (compatibility version 1.0.0, current version 1.0.0)
	@rpath/libLLVM.dylib (compatibility version 1.0.0, current version 12.0.1)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1775.118.101)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

$ otool -L usr/lib/libjulia.dylib 
usr/lib/libjulia.dylib:
	@rpath/libjulia.dylib (compatibility version 1.0.0, current version 1.8.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

@ihnorton
Copy link
Member

ihnorton commented Sep 2, 2021

I believe the problematic function call is here:

jl_libjulia_handle = jl_load_dynamic_library(JL_LIBJULIA_SONAME, JL_RTLD_DEFAULT, 1);

@ihnorton
Copy link
Member

ihnorton commented Sep 3, 2021

Here's a work-around for the issue that led me here, based on the fact that dlopens search order includes cwd:

import ctypes, os
libjl = ctypes.CDLL("/opt/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.dylib")

libjl.jl_get_libdir.restype = ctypes.c_char_p

prev = os.getcwd()
os.chdir(libjl.jl_get_libdir())
libjl.jl_init__threading()
os.chdir(prev)

I don't get why this only affects initialization from jl_init -- it looks like julia_init via jl_init should be called with the same arguments as it is in the REPL/cli startup case, but I haven't run a debug build.

@vtjnash
Copy link
Member

vtjnash commented Sep 24, 2021

Assigning Elliot, since I think he wrote the code that assigns those SONAMEs

@ViralBShah
Copy link
Member

ViralBShah commented Dec 7, 2022

@jonathan-conder-sm With #47220 merged is this ok to close?

@vtjnash vtjnash closed this as completed Dec 7, 2022
@vtjnash
Copy link
Member

vtjnash commented Dec 7, 2022

Looks like that should fix the SONAME

@jonathan-conder-sm
Copy link
Contributor

This is still an issue since that PR doesn't touch the 1.6 branch. However #47053 fixes it for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests