Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize artifact loading during Pkg.build stage, remove Pkg application dependency from JLL libraries #538

Closed
vtjnash opened this issue Dec 2, 2019 · 4 comments · Fixed by #880

Comments

@vtjnash
Copy link
Member

vtjnash commented Dec 2, 2019

Some history: way back when, the Julia ecosystem used to handle artifacts a little bit like jll files currently handles artifacts (through the BinDeps ecosystem)—figuring them out on the fly when loading the package and such. This was a performance and debugging (configuration) mess. Finally, Pkg.build was created to make sense of it all. Before that, loading packages was awkward (because they could run into various issues with loading files not working and not having good debugging tooling available at load time; and it violates the concept of .ji files being immutable caches dependent only on their .ji files), and perhaps equally importantly it was slow. Not very slow—but just enough to be a problem when it started getting used everywhere. We've improved many things since then (such as adding .ji incremental precompile files, baking Pkg into the system image, and making a larger precompiled sysimg images) to sometimes bury some of that overhead. And Artifacts+BinaryBuilder are also much better now, since they are are mostly always handled during Pkg.build and are more declarative and also more carefully managed (and managable).

But it seems that BinaryBuilder is also regressing the ecosystem somewhat on these axes too (e.g JuliaLang/julia#33985 (comment) and JuliaGraphics/Gtk.jl#447 (comment)). However, when you're an important low-level package like this, you have to be attentive to these little details—things that Pkg itself gets to ignore because it's just the end application.

I'm looking into a bit how to fix this, but opening the issue in advance as a place to track progress. I don't know specifically what this should look like yet, but some quick thoughts on various possible options for a roadmap:

  • move Platform definition code to Sys
  • move Artifact parsing code to Base
  • (or perhaps make a small package that just provides those helpers)
  • make the .jll files just fully declarative (so that the helper Module / function knows how to fill them in based on the contents of an Artifact file)
  • make a pre-processed binary representation in a cache file that contains the pre-processed graph (this is necessary also for fixing other Distributed.jl and incremental precompile issues too related to normal Project/Manifest usage—e.g. Workers should inherit Pkg environment JuliaLang/julia#28781 and Code loading might be better just fully parsing TOML files JuliaLang/julia#27414)
  • move calls that are using the Pkg application into a subprocess where they won't pollute the .ji file
  • create a database (and define a location for it, perhaps a sqlite file) that supports efficient Artifact queries (and perhaps Manifest queries too)
@staticfloat
Copy link
Member

  • move Platform definition code to Sys

This is probably a good idea; the code is complex but pretty battle-tested, and gives us a lot of neat things like the ability to figure out which libgfortran we're locked into by BLAS.

  • move Artifact parsing code to Base

I'm not sure why we need to do this? Can you explain the benefits?

  • make the .jll files just fully declarative (so that the helper Module / function knows how to fill them in based on the contents of an Artifact file)

We actually discussed this in the past, and decided to not do this so that JLL packages have an "escape hatch" in the case of truly exceptional situations where we need to run arbitrary Julia code at JLL package load time. By autogenerating JLL packages, we have a very flexible interface (e.g. all of Julia)

  • make a pre-processed binary representation in a cache file that contains the pre-processed graph

I'm assuming this bullet point is here because you want to include artifacts in the graph? (Or at least, Artifacts.toml files?)

  • move calls that are using the Pkg application into a subprocess where they won't pollute the .ji file

This would do more than the compiler barrier we recently introduced?

  • create a database (and define a location for it, perhaps a sqlite file) that supports efficient Artifact queries (and perhaps Manifest queries too)

What would this enable?

@vtjnash
Copy link
Member Author

vtjnash commented Dec 2, 2019

this bullet point

The list represents some alternative possibility, so some of the items may conflict and/or be irrelevant after others are implement.

This would do more than the compiler barrier we recently introduced?

Yes, it would also eliminate the Pkg dependency from the result—and usually keep us from loading it at all when the artifacts are already cached.

By autogenerating JLL packages, we have a very flexible interface

Yeah, but that also means you can't provide as good static tooling and all that (autogenerated) code duplication makes it harder to push bug fixes across the ecosystem. And the lack of __precompile__(false) at the top here already means you're already strongly promising not do anything too dynamic. Note too that I'm not saying there shouldn't be hooks, just that they should be encouraged to do most of their work during Pkg.build. Packages should still be allowed to add almost arbitrary code to their __init__ function.

(Future compatibility note: expect that eventually packages will lose the ability to modify certain constants though, such as the project and artifact environments—this is already known as a bit of a reliability and performance issue, but just hasn't been worked on yet, but c.f. JuliaLang/julia#27414 (comment) for example)

@KristofferC
Copy link
Member

KristofferC commented Dec 8, 2019

Yes, it would also eliminate the Pkg dependency from the result—and usually keep us from loading it at all when the artifacts are already cached.

But Pkg is already in the sysimage so does it matter if the whole Platform definition code is moved from one module to another? Naively, it just feels like moving things from one file to another. Is the fact that the "parent module" is Pkg relevant? Or are you saying that it would be good to avoid having a using Pkg at all in the jll packages?

(Future compatibility note: expect that eventually packages will lose the ability to modify certain constants though, such as the project and artifact environments

Sure, but how is this at all relevant to the discussion here? Jll packages don't modify the project or manifest environments, right?

@staticfloat
Copy link
Member

With the advent of JLL Wrappers and the new Artifacts system in 1.6, I consider this well and truly addressed. :)

@giordano giordano linked a pull request Sep 25, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants