Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load_libjulia in libjulialoader #37779

Closed
wants to merge 1 commit into from
Closed

Conversation

tkf
Copy link
Member

@tkf tkf commented Sep 28, 2020

From the discussion in #36588 (comment), I'd imagine we need something like void * load_libjulia(const char *) for loading libjulia in a cross-platform manner.

This is a dead-simple implementation that just extracts out the first lines of load_repl. @staticfloat If you have a better idea of doing this, please feel free to close this and implement it in a new PR.

Here is a simple Python session with this PR:

In [1]: import ctypes
   ...: libjulialoader = ctypes.CDLL("usr/lib/libjulialoader.so")
   ...: libjulialoader.load_libjulia.restype = ctypes.c_void_p
   ...: libjulialoader.load_libjulia(b"usr/bin")
Out[1]: 94362114802096

In [2]: libjulia = ctypes.PyDLL("usr/lib/libjulia.so", ctypes.RTLD_GLOBAL)

In [3]: libjulia._handle
Out[3]: 94362114802096

In [4]: Out[1] == Out[3]
Out[4]: True

Question: Should it be renamed to jl_load_libjulia or something?

cc @davidanthoff @GunnarFarneback

Copy link
Contributor

@yuyichao yuyichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is still fundamentally broken. It should not be required for the user to pass in the path to the julia executable since that's extremely error-prone and may be impossible.

Ref #36588 (comment)

Copy link
Member

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly how I would have done it.

@tkf
Copy link
Member Author

tkf commented Sep 28, 2020

I noticed that calling load_libjulia the second time returns a different handle (but third and later handles are identical to the second one):

In [1]: import ctypes
   ...: libjulialoader = ctypes.CDLL("usr/lib/libjulialoader.so")
   ...: libjulialoader.load_libjulia.restype = ctypes.c_void_p
   ...: libjulialoader.load_libjulia(b"usr/bin")
Out[1]: 94264976507248

In [2]: libjulialoader.load_libjulia(b"usr/bin")
Out[2]: 94264976504784

In [3]: libjulialoader.load_libjulia(b"usr/bin")
Out[3]: 94264976504784

In [4]: libjulialoader.load_libjulia(b"usr/bin")
Out[4]: 94264976504784

In [5]: libjulia = ctypes.PyDLL("usr/lib/libjulia.so", ctypes.RTLD_GLOBAL)

In [6]: libjulia._handle  # same as Out[1]
Out[6]: 94264976507248

I'm testing this on Linux. I thought dlopen would return the same handle? Is it expected?

@tkf
Copy link
Member Author

tkf commented Sep 28, 2020

Ah, I guess this is because

julia/cli/loader_lib.c

Lines 146 to 147 in da2935d

// Chop the string at the colon, load this library.
*colon = '\0';

mutates the global dep_libs in-place?


Edit: Ye, It looks like it. With this patch

diff --git a/cli/loader_lib.c b/cli/loader_lib.c
index ed989481b6..1b291bfb6c 100644
--- a/cli/loader_lib.c
+++ b/cli/loader_lib.c
@@ -40,6 +40,7 @@ static void * load_library(const char * rel_path, const char * src_dir) {
     strncat(path, src_dir, sizeof(path) - 1);
     strncat(path, PATHSEPSTRING, sizeof(path) - 1);
     strncat(path, rel_path, sizeof(path) - 1);
+    print_stderr3("`dlopen`ing: ", path, "\n");

     void * handle = NULL;
 #if defined(_OS_WINDOWS_)

I get

In [1]: import ctypes
   ...: libjulialoader = ctypes.CDLL("usr/lib/libjulialoader.so")

In [2]: libjulialoader.load_libjulia(b"usr/bin")
`dlopen`ing: usr/bin/../lib/libgcc_s.so.1
`dlopen`ing: usr/bin/../lib/libopenlibm.so.3
`dlopen`ing: usr/bin/../lib/libjulia.so.1.6
Out[2]: -1558041792

In [3]: libjulialoader.load_libjulia(b"usr/bin")
`dlopen`ing: usr/bin/../lib/libgcc_s.so.1
Out[3]: -1558044256

In [4]: libjulialoader.load_libjulia(b"usr/bin")
`dlopen`ing: usr/bin/../lib/libgcc_s.so.1
Out[4]: -1558044256

In [5]: libjulialoader.load_libjulia(b"usr/bin")
`dlopen`ing: usr/bin/../lib/libgcc_s.so.1
Out[5]: -1558044256

@yuyichao
Copy link
Contributor

This is exactly how I would have done it.

And you have still not answered the question of how are the user supposed to find the binary reliably. And the motivation for making embedding much harder than necessary.

@vtjnash
Copy link
Member

vtjnash commented Sep 28, 2020

@yuyichao Yes, you've made your concerns clear, but realize that the work done here was arrived at after much analysis (and even a couple failed PRs). It was just getting too awkward for libLLVM to be loaded one way, while all our other dependent libraries get lazy loaded if possible.

While we realize some of this work is still incomplete, the majority of the foundational work is being finished with this PR to permit making it easier to link against libjulia, while yet enhancing our ability to select, load, and upgrade the dependent libraries in situ. Merging that PR was necessary to help unstick some other pending work, so we opted to merge it early to give an opportunity that all of the pieces get testing and fixes now (thus further in advance of the actual release branch date) instead of being blocked on getting testing of all pieces on all platforms.

@yuyichao
Copy link
Contributor

yuyichao commented Sep 28, 2020

arrived at after much analysis (and even a couple failed PRs)

Where are they? As I already mentioned, the one failed PR about adding an executable wrapper on windows does not seem to have any applicable objection on it. (All objection are related to the change of PATH which is AFAICT unnecessary).

It was just getting too awkward for libLLVM to be loaded one way, while all our other dependent libraries get lazy loaded if possible.

Not sure how this is related. Here I'm not even talking about lazy vs not. Everything done here are eagerly loaded (and for libjulia dependency that's totally fine) and it was eager as well so nothing was changed. AFAICT the libraries that currently get lazily loades will still remain to do so, so nothing should have changed about the lasiness either. There are and will always be two different ways to load thing, one that must happen before or at the same time libjulia is loaded, and one at runtime done lazily. I don't see anything here or else change this aspect. The current version is by design/intentionally using two version of code to open libraries as well and I don't see that as being any less awkward in this regard. If anything, it is much more awkward since now there are basically three mechanism to load/link libraries, one is the unavoidable system dynamic linker, one is the dlopen in libjulialoader, and one is the runtime dlopen in libjulia. So if the awkwardness of multiple different ways to load things is a concern, which I kind of agree, this (i.e. #36588) should not be done.

While we realize some of this work is still incomplete, the majority of the foundational work is being finished with this PR to permit making it easier to link against libjulia, while yet enhancing our ability to select, load, and upgrade the dependent libraries in situ. Merging that PR was necessary to help unstick some other pending work, so we opted to merge it early to give an opportunity that all of the pieces get testing and fixes now (thus further in advance of the actual release branch date) instead of being blocked on getting testing of all pieces on all platforms.

Exactly, the implementation has many other problems that I was not even commenting much on. Having other stuff pending on this does not mean the change is good to go. The implementation can be bad but the design has to be sound. Most of what I was focusing on, and the only point I mentioned here, are AFAICT fundamental problems tied to how this is designed, the API, and not about the implementation.

Here, I'm only asking about the public API change for embedding. Unless future progress will completely remove the exe_dir from the API, this is not at all an implementation detail issue and not something that can be fixed by adding more stuff to rely on this. And if exe_dir is going away, then this PR should not be merged since it'll only increase the API breakage.

@staticfloat
Copy link
Member

Here, I'm only asking about the public API change for embedding. Unless future progress will completely remove the exe_dir from the API, this is not at all an implementation detail issue and not something that can be fixed by adding more stuff to rely on this. And if exe_dir is going away, then this PR should not be merged since it'll only increase the API breakage.

Let's focus in on exe_dir then; the design constraints are that we need a cross-platform way to load binaries that are located at an arbitrary location that is constant relative to the installation directory of Julia. In this case, the paths will be something like ${julia_install_root}/share/julia/stdlib/v1.6/artifacts/<hash>/lib/libLLVM.dll. What API would you suggest for allowing the library (libjulia or libjulialoader, or whatever entry point you prefer) to access these libraries?

@yuyichao
Copy link
Contributor

yuyichao commented Sep 28, 2020

What API would you suggest for allowing the library (libjulia or libjulialoader, or whatever entry point you prefer) to access these libraries?

Well, anything that does not require the user to specify additional path on the API, i.e. equivalent to specifying $ORIGIN/julia on libjulia now. This is the basic requirement of no functional regression without even talking about breakage. As I said, I'm totally fine with adding an API that is no-op on linux where things works correctly. That will break embedding API but isn't raising the requirement on embedding users (i.e. need to call more functions but no need to acquire more information than before).

Also I don't see why libraries linked to libjulia has to have a complicated path. It can be copied/linked to a simpler directory at build time. (i.e. I don't see why

the design constraints are that we need a cross-platform way to load binaries that are located at an arbitrary location that is constant relative to the installation directory of Julia. In this case

is a design constraint)

@staticfloat
Copy link
Member

Well, anything that does not require the user to specify additional path on the API, i.e. equivalent to specifying $ORIGIN/julia on libjulia now. This is the basic requirement of no functional regression without even talking about breakage. As I said, I'm totally fine with adding an API that is no-op on linux where things works correctly. That will break embedding API but isn't raising the requirement on embedding users (i.e. need to call more functions but no need to acquire more information than before).

One of the fundamental design patterns I want to avoid is platform differences. We build these platform abstraction libraries precisely so that users don't have to worry about what platform they're running on.

In the end, I don't see requiring passing in an anchoring path as a large burden, and since I don't see a feasible way around it that will still allow us to give the same guarantees of loading precisely the libraries that we want to, it seems like the best solution still.

Also I don't see why libraries linked to libjulia has to have a complicated path. It can be copied/linked to a simpler directory at build time.

This is best summarized in this comment. The benefits to this system are:

  • All libraries can be serviced by downloading JLLs, and instead of unpacking them into the lib directory of the Julia prefix, they instead get unpacked into an artifacts/<content hash> directory, just like the package manager would do them.
  • Access of libraries, even the ones that ship with Julia will be possible through JLL APIs
  • The resolver will know that certain JLLs are already shipped with Julia, allowing us to express proper version constraints upon the libraries included with Julia.

@yuyichao
Copy link
Contributor

yuyichao commented Sep 28, 2020

One of the fundamental design patterns I want to avoid is platform differences. We build these platform abstraction libraries precisely so that users don't have to worry about what platform they're running on.

Yes, that is a good goal, however,

  1. It's an abstraction for the user and it should not matter what platform dependent mechanism is used.
  2. It must not be placed before breaking use cases. That's exactly why I mentioned Julia 1.4 fails on startup (AMD Phenom on Linux) #35215. The series of change there first put "moderness" of the setup over performance and change of compilation environment and the follow up change put performance over breaking code for users. That was a complete priority invertion.

In the end, I don't see requiring passing in an anchoring path as a large burden, and since I don't see a feasible way around it that will still allow us to give the same guarantees of loading precisely the libraries that we want to, it seems like the best solution still.

I did give concrete arguments about this. Since that's what you ask for, please address how each usecase can be solved specifically rather than just say you don't see something as a large burden because it isn't a problem for the julia executable. As I said, the binary may or may not exist and the user may or may not be able to find the correct one. One more thing is that code like

int main()
{
    julia_init(...);
    /* call many jl_* functions */
}

are virtually impossible to port without a very major rewrite where the user has to match how libjulia is loaded, leaking the abstraction. The symbols are not going to be available for the compile time linker anymore.

And since abstracting out the platform is only necessary for the user, using RPATH when it exists is a perfectly good solution.

All libraries can be serviced by downloading JLLs, and instead of unpacking them into the lib directory of the Julia prefix, they instead get unpacked into an artifacts/ directory, just like the package manager would do them.

Whatever unpack them should be able to unpack them into either just fine.
Special case is already needed to generate the list in libjulialoader. That logic can simply be used to generate the additional copying rules.

Access of libraries, even the ones that ship with Julia will be possible through JLL APIs
The resolver will know that certain JLLs are already shipped with Julia, allowing us to express proper version constraints upon the libraries included with Julia.

The same, the JLL package, especially since the library path doesn't have to be constant anymore, does not have to point to the artifact one. Again, it needs the same set of special rules that already exist.

In another word, these are only "benefit" because you want to put the special logic at a particular place. They are in reality not benefit at all when that logic can move. None of these avoid the logic anyway.

More over, the artifact path can even be itself encoded in the RPATH and you don't even have to move the file.

@vtjnash
Copy link
Member

vtjnash commented Mar 15, 2022

is this still needed? IIUC, we have done something similar now

@tkf
Copy link
Member Author

tkf commented Mar 16, 2022

Yes, it looks like #38160 handles this more automatically.

@tkf tkf closed this Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants