Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake JLLs for all packages #38347

Merged
merged 2 commits into from
Dec 2, 2020
Merged

Conversation

staticfloat
Copy link
Member

@staticfloat staticfloat commented Nov 8, 2020

This completes the Fake JLL work that is necessary for 1.6, although I would still like to add a CompilerSupportLibraries_jll package that more reliably sets up our libgfortran/libstdc++ story. EDIT: This has been added.

This PR revamps somewhat the manner in which we download the BB-built binaries for base Julia; in particular, it provides "fake" JLL packages that provide roughly the same API as the real ones, and declares the appropriate version of the JLL packages such that the resolver will no longer attempt to download e.g. OpenBLAS_jll on client machines anymore; the version will be locked to the version shipped by Julia. Not only does this save users a bit of downloading time, it now allows package developers to express compatibility bounds upon dependencies, and not have to mentally remember that a certain version ships with Julia 1.6 and therefore their package is implicitly locked to Julia v1.6 as well; the resolver can now more faithfully do its job.

The build system is changed a bit here to streamline downloading of dependencies/JLLs and bundling the appropriate Artifacts.toml files, but don't get too attached to this method as it's going to change yet again once we have actual honest-to-god JLLs embedded as stdlibs (this will hopefully make it into v1.7).

There is a good chance that there are edge cases where this is going to break packages. We'll definitely need a PkgEval run with this one merged just to see where we stand on that.

This work formally fixes #35215, and finishes one big chunk of work for #33973.

TODO before this PR is ready:

  • Get it working on all platforms (so far only tested on linux64)
  • Clean up checksums (would be nice to add a CI run that deletes all the checksums then regenerates them all, complaining if there are either extraneous or missing checksums)
  • Test LLVM asserts build because I always mess that one up somehow

@staticfloat staticfloat added this to the 1.6 features milestone Nov 8, 2020
@vchuravy
Copy link
Member

vchuravy commented Nov 8, 2020

Looks good! What happens if I set USE_BINARYBUILDER=0 or use system libraries?

@staticfloat
Copy link
Member Author

Looks good! What happens if I set USE_BINARYBUILDER=0 or use system libraries?

Nothing changes on that front; these "fake JLLs" provide the Julia API that packages can expect (e.g. exporting functions for calling executables, defining symbols for library SONAMEs, etc...) but they don't actually provide the binaries. They just blindly dlopen(libname) and hope that the library the system picks up is valid. Functionally, this is identical to what Julia does today.

In the future, we'll use actual JLLs by default, and the binaries will be stored in artifacts/$hash directories and all that good stuff, but if USE_BINARYBUILDER_XXX=0 or USE_SYSTEM_XXX=1 is set, a "fake" JLL like these will be generated instead, so that we still have the Julia API for accessing the binaries (and the resolver still knows about the presence of that library), but the binary itself is still customizable.

.vscode/settings.json Outdated Show resolved Hide resolved
@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from 45d02dc to 27f7b75 Compare November 10, 2020 22:23
@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from fd0d7a8 to e982a01 Compare November 15, 2020 03:50
@staticfloat staticfloat reopened this Nov 15, 2020
@staticfloat staticfloat reopened this Nov 16, 2020
@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from 2756e5c to f2424e8 Compare November 17, 2020 01:45
@staticfloat staticfloat marked this pull request as ready for review November 17, 2020 03:50
@staticfloat
Copy link
Member Author

Looks like this is finally passing tests! Huzzah!

Next up, fix the checksums.

@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch 3 times, most recently from c1a8410 to d564bac Compare November 18, 2020 00:09
@staticfloat
Copy link
Member Author

staticfloat commented Nov 18, 2020

I think this is ready for review now! I have checked that it passed all tests with a USE_BINARYBUILDER=0 build on Linux x86_64, the checksums look like they were updated properly (with a few fixes given the build system pruning I did in this branch) and the LLVM asserts build looks like it worked flawlessly!

There is likely some USE_SYSTEM_XXX=1 configurations that will be broken by this, since we're now calling dlopen(SONAME) on libraries we weren't before, and some of the libraries are more versioned than they were in the past (e.g. we call dlopen("libgit2.so.1.1"), whereas before we would just rely on ccall(func, :libgit2), ...) to pick up any old libgit2 that would work. IMO this is more a feature than a bug, as we actually do need the new version, but if we get reports of configurations no longer working we may need to provide ways for the buildsystem to get a new SONAME that it should attempt to dlopen() instead of the hardcoded values here that correspond to the versions that the JLLs are serving.

When reviewing this, I suggest looking at the first commit in isolation so as to filter out the massive number of checksum files that were modified.

@yuyichao
Copy link
Contributor

instead of the hardcoded values here that correspond to the versions that the JLLs are serving.

If this is what you do then I'll report now that it's broken.

@staticfloat
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch 3 times, most recently from 04d1c16 to 233164f Compare November 19, 2020 07:56
@staticfloat
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from 233164f to fac77f4 Compare November 19, 2020 18:23
@staticfloat
Copy link
Member Author

@nanosoldier runtests(["AlphaStableDistributions", "BifurcationKit", "CBindingGen", "JLLWrappers", "JetPackWaveFD", "OhMyQSIM", "P4est", "PLCTag", "QuantumTomography", "RiskAdjustedLinearizations"], vs = ":master")

@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from fac77f4 to 203fa32 Compare November 19, 2020 19:44
@staticfloat
Copy link
Member Author

From my reading of PkgEval, this is now good to go once Jameson has a chance to take a look.

@staticfloat staticfloat changed the title [WIP] Fake JLLs for all packages Fake JLLs for all packages Nov 21, 2020
@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from 203fa32 to bb81527 Compare November 24, 2020 22:07
@staticfloat
Copy link
Member Author

@vtjnash let's discuss the performance implications here.

So it appears that on my Linux machine, this PR doesn't significantly increase startup time (<1ms, which is <1%). On MacOS, on the other hand, it increases it by ~20% (40ms) which is obviously unacceptable. The challenge is that JLLs don't (yet) provide a lazily-loaded API, and there are real packages out there that depend on e.g. MbedTLS_jll.

Since Julia itself doesn't yet use the JLLs as the Julian API for accessing binaries (that's part of my un-merged work that is getting pushed back to v1.7) it's not critical for me to add in the lazily-loading API, and indeed it's not even critical for the JLL modules to be loaded at startup at all.

Is there a way I can include the JLLs in the sysimg (so importing is fast) but not have Julia automatically import them? So, e.g. the __init__() methods are not actually called until a user runs using MbedTLS_jll.

@vtjnash
Copy link
Member

vtjnash commented Nov 25, 2020

No, but I think we can just remove all text content from these packages without causing any breakage to the julia build (since nothing changes, they're just made into purely fake stub files)?

@staticfloat
Copy link
Member Author

No, but I think we can just remove all text content from these packages without causing any breakage to the julia build (since nothing changes, they're just made into purely fake stub files)?

It won't break the julia build, but it will break all packages that use these JLL packages.

# This file is a part of Julia. License is MIT: https://julialang.org/license

## dummy stub for https://github.com/JuliaBinaryWrappers/GMP_jll.jl
module GMP_jll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing this to https://github.com/JuliaBinaryWrappers/GAP_jll.jl/blob/main/src/GAP_jll.jl I notice a couple differences:
A very minor one is that the the real GAP_jll is a baremodule which supposedly helps performance a bit -- but that's irrelevant for compatibility.

More importantly, though, the JLL API implemented here seems to match the "old" one (you know, from the stone ages, i.e., 2-3 months ago). A quick check shows that it is missing at least these:

  • best_wrapper
  • dev_jll
  • find_artifact_dir
  • get_libgmp_path
  • get_libgmpxx_path

I actually just today adapted some of my code to use GMP_jll.find_artifact_dir() instead of GMP_jll.artifact_dir, as I just learned about find_artifact_dir and thought this was the way forward. Now it isn't?

Perhaps it would be a good idea to use something like JLLWrappers here, too, to avoid some of the repetition and ensure consistency. Indeed, perhaps one could adapt JLLWrapper itself?

Finally, and for me perhaps most seriously: this fake JLL will lack the GMP header files, won't it? Because those actually are quite crucial for me, as I have to compile some C code on the user machine which links against GMP_jll. Yes yes, I should put it all into JLLs, and I did it for most of the code, but there are some parts where this is rather problematic.

If there was a way to get the "real" GMP_jll (and similar for other JLLs in this PR), just for accessing the headers in it, that'd solve this final issue for me (in particular, it would not be necessary for me to load the "real" GMP_jll, just a way to get paths into it would suffice. Is something like that a possibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it isn't?

It is.

If there was a way to get the "real" GMP_jll (and similar for other JLLs in this PR), just for accessing the headers in it, that'd solve this final issue for me (in particular, it would not be necessary for me to load the "real" GMP_jll, just a way to get paths into it would suffice. Is something like that a possibility?

Yes! This is precisely why we bundle a StdlibArtifacts.toml file; so that "special" packages that are in the know can do these kinds of workarounds. In your case, you'll just need to do some artifacts shenanigans if you're on v1.6, where you find the StdlibArtifacts.toml file, select the appropriate GMP artifact within it, and then you can use your headers and whatnot as usual.

Perhaps it would be a good idea to use something like JLLWrappers here, too, to avoid some of the repetition and ensure consistency. Indeed, perhaps one could adapt JLLWrapper itself?

I would have loved to, and indeed, the first cut at this did exactly that, but there are some issues:

  • First, the JLLWrappers API has evolved a lot over the past few months. You've seen this (and helped a bit) so I knew that wholesale importing JLLWrappers into v1.6 wasn't quite a possibility yet. IMO, better to ship no API at all then to ship a broken one that we replace two months later in v1.7.

  • Second, we don't have actual artifacts getting installed on v1.6 yet. That breaks a lot of assumptions in the JLLWrapper world. I would like for JLLWrappers to be smart enough to deal with USE_SYSTEM_XXX=1 configurations where you just dlopen("libfoo.so"), alternate artifact location configurations (e.g. a LibFoo_jll_debug=true is set in your LocalPreferences.toml) or complete path overrides (e.g. the preference LibFoo_jll_libfoo_path=/opt/special_julia_stuff/lib/libfoo.so.1.2 is set). But that API isn't here yet, and that's really what we need in order for JLLs to be useful across all Julia configurations.

Ideally, it would all come together in one piece, but unfortunately that kind of thinking ends up with PRs that have thousands of lines changed at once, and are completely un-reviewable. So we have to do them in small batches, and when release time comes, I can only hold it back for so long. ;)

Unfortunately, this forces a bit of complexity onto you, but hopefully it's not too too inconvenient to have some v1.6-specific workarounds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and we can add some of those API pieces in. Adding e.g. get_libgmp_path() shouldn't be too hard, and get_artifact_dir() can just return dirname(Sys.BINDIR) or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. Great to hear about StdlibArtifacts.toml, that's a relief :-)

Copy link
Member

@StefanKarpinski StefanKarpinski Nov 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to give a little higher level context: 1.6 is going to be a somewhat awkward middle state for stdlib JLLs which will be resolved in 1.7 by making stdlib JLLs really be fully normal JLLs, but it's a big change, so not all of it made it into the 1.6 release. I'm really looking forward to consistently using real JLLs everywhere as it will make a lot of things much better and easier.

@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from 32414a2 to 3b82a7e Compare December 1, 2020 01:39
@staticfloat
Copy link
Member Author

@vtjnash I think I've found a happy middle ground; by excluding JLL packages from the system image, we should be able to avoid paying the startup costs. This means that when you type using LibGit2_jll you may need to wait for precompilation and loading but that's fine; that's no worse than it is now. The stdlibs still participate in Pkg resolution logic though, which is the most important part.

This needs JuliaLang/Pkg.jl#2251 to be usable, since precompiling stdlibs (which don't generally have version fields) crashes on current Pkg master.

Once Jameson okays this, we can merge this and be one step closer to branching. I will have some backport work to do for @fingolfin's API concerns above, but those aren't blocking the branch, they're fairly easy to add and are isolated from the rest of Julia, so I am confident that Max and I can get it worked out between us without needing to bother others.

@vtjnash
Copy link
Member

vtjnash commented Dec 1, 2020

That sounds pretty reasonable (even goes along the lines of #38119, which is blocked on the same Pkg bug). I think, however, that this may indicate it is a more complex and comprehensive API than we want to include in the stdlib distribution? The goal of Jeff's latest feature update to ccall is to eliminate these __init__ methods completely.

@timholy
Copy link
Member

timholy commented Dec 1, 2020

For those of us who haven't been following but who find this very interesting...is there going to be any change in latency? Does stdlib precompiliation cache native code or just type-inferred code?

@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch 2 times, most recently from 08526e3 to 492dbac Compare December 1, 2020 18:55
JLLs provide the binary interaction interface that both packages and the
Pkg resolver require in order to properly version and interact with the
binary dependencies shipped with Julia.  This PR changes all binary
dependencies to carry along a JLL package that dlopen()'s the binary
that ships along with Julia.  These JLL packages are not the true JLL
packages that are generated by BinaryBuilder, but are instead "fake"
JLLs that simply `dlopen(filename)` and rely upon the default dlopen
path to find the appropriate libraries, which affords the opportunity
for system libraries and bundled-with-julia libraries alike to both be
represented through the JLL interface.
@staticfloat staticfloat force-pushed the sf/fake_jlls_fake_jlls_everywhere branch from a67ff03 to ca4a22a Compare December 1, 2020 23:52
@staticfloat
Copy link
Member Author

For those of us who haven't been following but who find this very interesting...is there going to be any change in latency? Does stdlib precompiliation cache native code or just type-inferred code?

There shouldn't be any change to latency after these latest changes. We're basically including these JLL packages so that they can participate in Pkg.add() decisions (e.g. lock the JLLs to the version that ships with Julia) but not including them in the system image.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2020

It looks like the added patch against pcre makes the build dependent upon automake (which is not listed as build dependency presently IIUC)? Or am I misunderstanding? :)

checking for sys/param.h... yes
checking sys/processor.h usability...  cd /build/source/deps/srccache/pcre2-10.35 && /nix/store/2jysm3dfsgby5sw5jgj43qjrb5v79ms9-bash-4.4-p23/bin/bash /build/source/deps/srccache/pcre2-10.35/missing automake-1.16 --gnu
/build/source/deps/srccache/pcre2-10.35/missing: line 81: automake-1.16: command not found
yes
WARNING: 'automake-1.16' is missing on your system.
         You should only need it if you modified 'Makefile.am' or
         'configure.ac' or m4 files included by 'configure.ac'.
         The 'automake' program is part of the GNU Automake package:
         <https://www.gnu.org/software/automake>
         It also requires GNU Autoconf, GNU m4 and Perl in order to run:
         <https://www.gnu.org/software/autoconf>
         <https://www.gnu.org/software/m4/>
         <https://www.perl.org/>
make[2]: *** [Makefile:1413: /build/source/deps/srccache/pcre2-10.35/Makefile.in] Error 1
make[1]: *** [/build/source/deps/pcre.mk:33: scratch/pcre2-10.35/build-compiled] Error 2
make[1]: *** Waiting for unfinished jobs....

@NHDaly
Copy link
Member

NHDaly commented Dec 18, 2020

^ specifically, this patch:
deps/patches/pcre2-cet-flags.patch
https://github.com/JuliaLang/julia/pull/38347/files#diff-6e05f82cf91be820214a655417436376b1d4c21a815e3ad25eb26ec16e8e2fd8

@Sacha0
Copy link
Member

Sacha0 commented Dec 19, 2020

Is it possible that the additions to stdlib/Makefile are missing USE_BINARYBUILDER guards or something to that effect? With USE_BINARYBUILDER=0 (absent network access), we hit

    PERL base/features_h.jl
    CC usr/lib/libllvmcalltest.so
    PERL base/build_h.jl.phony
Warning: git information unavailable; versioning information limited
Could not find working curl, wget, or fetch.
You need to install one of these to download dependencies.
make[1]: *** [Makefile:38: /build/source/stdlib/dSFMT_jll/StdlibArtifacts.toml] Error 1
make[1]: *** Waiting for unfinished jobs....
Could not find working curl, wget, or fetch.
You need to install one of these to download dependencies.
make[1]: *** [Makefile:38: /build/source/stdlib/GMP_jll/StdlibArtifacts.toml] Error 1
make: *** [Makefile:64: julia-stdlib] Error 2
make: *** Waiting for unfinished jobs....

after the deps build finishes.

@staticfloat
Copy link
Member Author

This will be addressed via #38962

vstr = zeros(UInt8, 32)
ccall((:mbedtls_version_get_string, libmbedcrypto), Cvoid, (Ref{UInt8},), vstr)
vn = VersionNumber(unsafe_string(pointer(vstr)))
@test vn == v"2.24.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These very strict version checks are problematic with USE_SYSTEM_MBEDTLS=1 and equivalents, as the system version is unlikely to match exactly. Would it be OK if we replaced this check with a comparison against the minimum required version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Julia 1.4 fails on startup (AMD Phenom on Linux)