-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce libjuliarepl
to break dependence on runtime libraries
#36588
Conversation
The names Also repl.c is so small we could consider just moving its code into libjulia. |
This looks great! I think the only drawback I can think of right now is for the scenario that @GunnarFarneback mentioned in the other PR: if a third party executable wants to load |
Even if it isn't part of |
Very important if you ask me. It's a major feature that the third party executable can load multiple versions of Julia without recompilation (it doesn't even include |
The |
Thanks for the ping. Yes, something like At least, I think this Lines 27 to 33 in fbcbac7
It'd be great if we can have something like For PyJulia, it's OK if |
So what I'm hearing is the following:
Does that satisfy all usecases? |
Sounds great to me! |
It sounds like it would be fine. Does this PR have any consequences for the |
I think the concerns within #32614 are the same with or without this PR. |
91633f1
to
793de64
Compare
78d310f
to
538cd85
Compare
@vtjnash I've updated this to work much better on Windows, unfortunately I have to provide my own, worse, definitions for many library functions but that's fine. I'm obfuscating the names away so they don't conflict with anything that is imported in the future that actually wants to find a performant Unfortunately, when I attempt to build this on Windows, it never completes bootstrap. Looking at stack traces taken while it's running, it looks to me like everything is chugging along properly, it's compiling code and whatnot, but it just never finishes. I can't figure out what here would cause it to never finish. The only thing I can think of is that something has caused a massive slowdown here such that the initial Do you have any ideas on what might be causing this? |
3487ed8
to
9926062
Compare
2c0717e
to
ac9ac35
Compare
ac9ac35
to
25be74c
Compare
I finally, FINALLY, figured it out. And it's oh so simple. I just wasn't calling Now we can move on to more typical issues; I now get halfway through bootstrap on windows, but @vtjnash what do you suggest here? I agree with your comment in that function that explicit is better than implicit, but I also don't want to open ourselves up to issues here where we don't get the libc name just right, or certain functions are put into |
@vtjnash: Attempting to statically link
Do we need to pass in an option to suppress linking of |
acd6ed3
to
cd17173
Compare
It passed on all platforms! Hallelujah! If this rebased version passes, let's merge. |
# * debug builds must link against libjuliadebug, not libjulia | ||
# * install time relative paths are not equal to build time relative paths (../lib vs. ../lib/julia) | ||
# That second point will no longer be true for most deps once they are placed within Artifacts directories. | ||
LOADER_BUILD_DEP_LIBS = $(LIBGCC_BUILD_DEPLIB):$(LIBOPENLIBM_BUILD_DEPLIB):$(LIBJULIA_BUILD_DEPLIB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it easier on system where the system library is working!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to USE_SYSTEM_LIBM=1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which option causes it but, it's trying to open usr/lib/libgcc_s.so.1 or sth like that which was not created.
https://build.archlinuxcn.org/~imlonghao/log/julia-git/2020-09-25T01%3A17%3A01.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the openlibm path it is trying to dlopen is indeed wrong as well. The symlink to system lib is at usr/lib/julia/libopenlibm.so
whereas this is trying to open usr/lib/libopenlibm.so.3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we'll need to change LIBOPENLIBM_NAME
if USE_SYSTEM_LIBM=1
.
} | ||
#elif defined(_OS_LINUX_) | ||
// On Linux, we read from /proc/self/exe | ||
int num_bytes = readlink("/proc/self/exe", exe_dir, PATH_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should be fixed. I have definately used julia in chroot environments that does not /proc
mounted.
It was usually a mistake of course but it was previously working. (edit: just checked that this is actually broken due to libuv bug. Still, this is another failure that previously doesn't happen.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a good way to find the location of the current process without using /proc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but the system linker behavior is to fallback to default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again, this should just be linked in.
In general, for the libraries used by Since nothing was broken, AFAICT, this was only done to "in anticipation of JLL stdlibs". If that is the only argument against directly linking the libraries, especially |
The entire reason this PR exists is to allow us to run code before dynamic link time, because there is no RPATH facility on windows. This PR allows us to emulate RPATH on all systems.
Since that's the entire point of this PR, no, it's not going to be reverted. The ideal situation is that this loader depends on as few libraries as possible, and manually |
Yes, but the solution should NOT be throwing away RPATH on platform that has it and then trying to bring back a version that's worse. I'm saying that when it is natively supported it should be used. On platforms that doesn't have it doing whatever needed to simulate it is totally fine.
Again, this is basically saying that since some platforms are broken, let's just break it for everyone and then make everyone as broken as the natively broken one. It's penalizing people that has proper system setup to allow correct lookup. The fix should be ONLY applied to broken systems rather than the other way around. |
And I also don't see why that's a problem. It's only needed for libraries that are needed by Also I don't think it needs to be on the global |
And speed up startup.
And speed up startup.
On the other hand, if we use certain mechanisms on one platform and different mechanisms on the other, we get very hard to track down bugs like inconsistent libm symbol resolution. Many months ago I already announced my intention to do this in #33973 and #35193, and through discussion with other members of the compiler team, we decided that using a consistent approach across all platforms is better than having a patchwork approach where platform-specific differences are more likely to occur.
In what concrete ways is this penalizing any users?
That's correct, but unfortunately, that list includes libLLVM, libgmp, libmpfr and libpcre. If a user wants to be able to type |
Well, you ARE already using different mechanism on different platforms. This is not something you can possibly avoid. Even
Slower startup. Breaking linker optimization. Harder embedding (can't just link to the library anymore). Breaking build. Breaking tools inspecting linkage/symbols/rpaths.
If the reason is that |
While you are technically correct, I believe my statement stands in that using the same type of mechanism (e.g.
I agree; any shortcomings in the "
Concretely, what linker optimizations are being broken, and what performance impact is it having?
That's precisely why we have
Yes, we will address the openlibm linking issue so that it's easier for users to continue to use
This is no more broken than all the other libraries we
We already discussed this and decided against that because it is not clean enough. |
Cool.
There are many other unrelated changes in this commit which is probably the cause of the startup time difference. I never said every single changes here are bad. Compared to a directly linked one, what I see is a ~1% slow down and this is not including more libraries that you are going to dlopen in the future. And this is also a general comment about openning everything up front (i.e. the JLL package approach in general) it can easily cost miliseconds per library on a very fast system and much more on a slower one.
The thread local variable allocation. It's about as much an impact as the static tls optimization for the JIT code. Using the benchmark I did back then #14083 (comment) , it's ~70-80% slow down on the access of the variable itself. (i.e. almost 2x, not a minus 70-80% change in time, which would be 3x-5x).
And that's exactly what's broken about it. The application is now totally on it's own to figure out where the library is, rather than simply relying on the platform linker to find the library. The
No it's much more broken than that. The way we use
Well, it's no dirtier than what's done here. Both require a relative hardcoded relative path in the user-facing executable to the library/real executable. It has minimum difference between platforms (the real executable would be exactly the same). And it's cleaner in the sense that it doesn't try to use any home baked mechanism to replace the system one. |
JLLs no longer open everything up front. That’s one of the points of this whole endeavor. They now load libraries on demand instead, massively speeding up loading of JLLs. At a high level, Yichao, this feedback should have come months ago. This PR has been here for months. There was another PR before that which did the script wrapper thing you’re talking about and that was open for a month or so and ultimately rejected in favor of this approach. |
While that’s not true of most JLLs today, it will happen eventually, thanks to Jeff’s work on allowing non-const expressions as the library name. |
Well, this is the least useful comment of all time though I did expect exactly this comment from you. Nothing is set in stone yet so it is not a bad time to change things. This is also not the first time something is broken for real users in name of "improving" external library handling only to be caught later. As for why I didn't comment earlier, it was not clear at all from the title what this was doing. The title reads like it's talking about embedding but is apparently not. There's no time for me to test every single PR on every single revision in the off chance that the title doesn't completely and ambiguiously reflect the change and so some of them have to rely on testing after merging to see the impact.
No, I believe this PR has nothing to do with that. Allowing non-const ccall library path was the ONLY thing needed.
Well, the list you give includes ones (gmp, mpfr, pcre) that are libraries used in julia code, not libjulia. In another word, none of those should be included in the list ever. And all the concerns about dlopen vs direct linking and breaking other things are still valid. #36588 (comment) |
And if the other PR was #35629. AFAICT the only concern for that was modifying |
When running Julia in embed, libjuliarepl needs to be loaded? |
Happy to meet expectations. |
Yichao, I understand that you're upset that this got through and you don't like the approach. But frankly, unless you can give explicit, concrete proof of the downsides you are claiming, the basic thrust of this PR still seems the best path forward.
I'm totally fine with a 1% slowdown. Having the security of knowing precisely what library I'm loading (by using full relative path instead of whatever happens to be on the path is a benefit, not a downside.
If the library is guaranteed to be needed anyway (like LLVM, GMP, MPFR, and PCRE, which are all required by Base julia) there is no slowdown, because the work must be done no matter what. Once we have the capability to do things like strip regex parsing out of Julia, yes, then we can tackle the issue of lazily loading libpcre, but there's no point in debating it now. And besides, in this PR, we're only front loading LLVM, libgcc_s and openlibm precisely because this fixes issues in platforms that have problems because of the library loading strategy you are arguing for. If we let the system load its default These are perfect usecases for why it is useful to have a relative-path RPATH loading mechanism on all platforms. And the added maintainability of not needing to determine the differences between
As mentioned above, it does not just work on all platforms, and using our own mechanism can work in all cases. There are large upsides to being able to guarantee that exactly the version of As I said above, the
It's all interconnected; it's best for non-const
I see one concrete concern, which is that TLS can be slower. I attempted to recreate your benchmark and found no performance difference at all between this PR and the commit before it. At this point, I believe the burden of proof is upon you with regards to performance concerns; both of the concerns you have raised have turned out to be in favor of this PR, so if you have a performance concern please do a benchmark and show the impact.
Let's not bring old issues into this, I don't think that's helpful. To be clear; that change did improve things, drastically, for real-world users. It sped up GMP and MPFR by a factor of two on older compilers, however, once newer compilers came out, we were able to roll back the
@phykos It depends on what your application is doing, but yes, in most cases. For now, the libraries |
I've given you answer every single time you asked for a concrete example so I'm not sure what you are talking about here.
Well, the issue is that you want to do more of this.
And I'm complaining exactly because the version you have breaks the maintainability for everyone downstream that expect the standard mechanism to be used. In another word, that's an extremely selfish choice.
Errr, no. With full path there's never a need for "RPATH" capability anywhere else. RPATH is only useful if you don't have full path.
So 1% slow down is in favor I guess. Again, given the limited number of library you are loading now, I don't expect there to be much of a difference right now. Trying to load more and more is the problem.
Well, that wasn't what I was talking about. Maybe I should have changed the link but I was talking about going back to old compiler in the first place. There was even a PR (I think) to revert the use of binary builder that you refuse to take for months and instead took the path that break it for other people.
And again, that's exactly what's broken about it and was the main concern that keep getting ignored, i.e. the machanism you invented is unexpected to users and is broken (and the secondary concern being the performance). You are now requiring the embedding use case to supply the Edit: And to be even more explicit.
The new workflow is,
Compared to previously on systems with working RPATH
I have little problem adding a |
And speed up startup.
This changes
ui/repl.c
to compile into a new library,libjuliarepl
.This library is loaded by a very simple loader executable, defined in
loader.c
. This loader can have paths to dependent libraries embeddedwtihin it in order to ensure that certain libraries have been loaded
within Julia by the time
libjuliarepl
andlibjulia
itself areattempted to be loaded within Julia.