Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: refactor the code loading of packages #46690

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

KristofferC
Copy link
Member

The current code for code loading is quite stateless (although there is currently a cache for parsed TOML files). Every time information about a package is looked up, the code loading system starts from scratch by searching through the load path, reading through manifests looking for the piece of information it needs. Touching disk a lot was a performance problem that is now mostly worked around in #40890 but the solution in there is quite ugly and ad hoc.

This PR explores another approach where the full information of an environment is parsed and stored in an object (which then makes lookups trivial). This is a bigger initial cost but the idea is that environments change rarely enough that this can be cached over multiple calls into code loading (as well as getting passed to precompile workers) which will save work in the long run.

This PR so far implements the new lookup strategy but it still needs the caching part where the environment stack is persisted over several calls to require as well as getting passed to precompile workers. This will also need a PkgEval run since there might be quite a few packages using the internals of code loading (maybe Revise for example).

For ease of reading, the new stuff is in a separate file called codeloading2.jl but this will be folded into loading.jl when this is done.

@KristofferC KristofferC added the packages Package management and loading label Sep 9, 2022
@vtjnash
Copy link
Member

vtjnash commented Sep 9, 2022

This will replace the https://github.com/JuliaLang/julia/compare/jn/loading-plus branch (8ca3f3b), right?

@KristofferC
Copy link
Member Author

Yes, that's the idea.

@KristofferC KristofferC force-pushed the kc/codeloading2.0 branch 2 times, most recently from d09bea7 to 5c17075 Compare September 9, 2022 14:16
@vtjnash
Copy link
Member

vtjnash commented Sep 9, 2022

Awesome. I will delete that then now

IanButterworth added a commit that referenced this pull request Mar 5, 2024
Parallel precompilation is more or less now required in order to use
somewhat large packages unless you want to wait an obscene long time for
it to complete. Right now, we even start a parallel precompilation on a
package load if we notice that the package you are loading is not
precompiled.

This functionally has typically been implemented in Pkg but with Pkg not
being in the sysimage it becomes a bit awkward because we then need to
load Pkg from Base. The only real reason this functionality has been
implemented in Pkg is that Pkg has some useful features for parsing
environments. Moving precompilation to Base has typically been stalled
on such an environment parser not existing in Base.

However, in #46690 I started
implemented code loading on top of a more up front environment parser
(instead of the "incremental" one that currently exists in `loading.jl`)
and we can retro fit this to be used as the basis of parallel
precompilation. At some later point code loading could be implemented on
top of it but that is for now considered future work.

This PR thus adds the environment parser from the codeloading PR and
implementes the parallel precompilation feature from Pkg on top of it
(instead of on top of the `EnvCache` in Pkg).

Some points to bring up here:

- This copy pastes the progress bar implementation in Pkg into here. It
is probably a bit excessive to use so we can simplify that
significantly.
- Parallel precompilation uses the `FileWatching` module to avoid
different processes trying to precompile the same package concurrently.
Right now, I used grab this from `Base.loaded_modules` relying on it
being in the sysimage.
- This removes the "suspended" functionality from the Pkg precompilation
which does not try to precompile packages if they have "recently" failed
which is unclear how useful it is in practice. This also requires the
Serialization stdlib and uses data structures defined in Pkg so it is
hard to keep when moving this to Base.

---------

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
KristofferC added a commit that referenced this pull request Mar 6, 2024
Parallel precompilation is more or less now required in order to use
somewhat large packages unless you want to wait an obscene long time for
it to complete. Right now, we even start a parallel precompilation on a
package load if we notice that the package you are loading is not
precompiled.

This functionally has typically been implemented in Pkg but with Pkg not
being in the sysimage it becomes a bit awkward because we then need to
load Pkg from Base. The only real reason this functionality has been
implemented in Pkg is that Pkg has some useful features for parsing
environments. Moving precompilation to Base has typically been stalled
on such an environment parser not existing in Base.

However, in #46690 I started
implemented code loading on top of a more up front environment parser
(instead of the "incremental" one that currently exists in `loading.jl`)
and we can retro fit this to be used as the basis of parallel
precompilation. At some later point code loading could be implemented on
top of it but that is for now considered future work.

This PR thus adds the environment parser from the codeloading PR and
implementes the parallel precompilation feature from Pkg on top of it
(instead of on top of the `EnvCache` in Pkg).

Some points to bring up here:

- This copy pastes the progress bar implementation in Pkg into here. It
is probably a bit excessive to use so we can simplify that
significantly.
- Parallel precompilation uses the `FileWatching` module to avoid
different processes trying to precompile the same package concurrently.
Right now, I used grab this from `Base.loaded_modules` relying on it
being in the sysimage.
- This removes the "suspended" functionality from the Pkg precompilation
which does not try to precompile packages if they have "recently" failed
which is unclear how useful it is in practice. This also requires the
Serialization stdlib and uses data structures defined in Pkg so it is
hard to keep when moving this to Base.

---------

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
(cherry picked from commit 6745160)
mkitti pushed a commit to mkitti/julia that referenced this pull request Apr 13, 2024
Parallel precompilation is more or less now required in order to use
somewhat large packages unless you want to wait an obscene long time for
it to complete. Right now, we even start a parallel precompilation on a
package load if we notice that the package you are loading is not
precompiled.

This functionally has typically been implemented in Pkg but with Pkg not
being in the sysimage it becomes a bit awkward because we then need to
load Pkg from Base. The only real reason this functionality has been
implemented in Pkg is that Pkg has some useful features for parsing
environments. Moving precompilation to Base has typically been stalled
on such an environment parser not existing in Base.

However, in JuliaLang#46690 I started
implemented code loading on top of a more up front environment parser
(instead of the "incremental" one that currently exists in `loading.jl`)
and we can retro fit this to be used as the basis of parallel
precompilation. At some later point code loading could be implemented on
top of it but that is for now considered future work.

This PR thus adds the environment parser from the codeloading PR and
implementes the parallel precompilation feature from Pkg on top of it
(instead of on top of the `EnvCache` in Pkg).

Some points to bring up here:

- This copy pastes the progress bar implementation in Pkg into here. It
is probably a bit excessive to use so we can simplify that
significantly.
- Parallel precompilation uses the `FileWatching` module to avoid
different processes trying to precompile the same package concurrently.
Right now, I used grab this from `Base.loaded_modules` relying on it
being in the sysimage.
- This removes the "suspended" functionality from the Pkg precompilation
which does not try to precompile packages if they have "recently" failed
which is unclear how useful it is in practice. This also requires the
Serialization stdlib and uses data structures defined in Pkg so it is
hard to keep when moving this to Base.

---------

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs pkgeval Tests for all registered packages should be run with this change packages Package management and loading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants