Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Run precompilation workloads at top-level instead of within package #51905

Open
NHDaly opened this issue Oct 27, 2023 · 3 comments
Open
Labels
compiler:precompilation Precompilation of modules feature Indicates new feature / enhancement requests

Comments

@NHDaly
Copy link
Member

NHDaly commented Oct 27, 2023

Currently, as part of the PkgImages feature introduced in julia 1.9 (and even in older versions of julia), users are encouraged to run snoop workloads at the end of the module definition, in order to capture compilation of julia generic functions that are invoked during that workload.

This introduces at least two problems:

  1. PrecompileTools doesn't run __init__() so some functionality may not work during package compilation? PrecompileTools.jl#32
    • For some modules that use __init__() the expectation is that the module's functions will not be called until the module is initialized, which we don't do during precompilation.
    • We don't init the modules for good reason: we don't want to serialize the module's runtime data, we only want to initialize that data at runtime.
    • But this is a conundrum! We need to init to run the snoop but we must not init to preserve the correctness.
    • The best answer is currently to manually init the module's data, run the workload, then uninit the data at the end. :/
      • obviously tricky and error prone.
  2. Task cannot be serialized error during precompilation disappeared in 1.9 #49513
    • As I understand it, this is the fundamental issue:
      • Since we serialize the entire module at the end of precompilation, if the module is pointing to any running Tasks, we cannot (de)serialize those safely. So starting in 1.10, we introduce a mechanism to block until all those tasks have finished.
      • But if you are running a complex snoop workload, and that workload creates some tasks or IO objects, it can be difficult and error prone to track all of them down and correctly shut them down before finishing the workload.

I would like to propose that we introduce an in-language supported mechanism to run a snoop workload, after a module is closed.

Syntactically, I think it could be as simple as moving the snoop / precompile statements to after the module, maybe by registering them in a callback that will be called when the runtime is finished closing the module. Something like:

module MyPackage
end

Base.precompilation(MyPackage) do
    # setup state
    MyPackage.setup()
    # run precompiles and/or snoop workload
    precompile(...)
    MyPackage.do_stuff()
end

Semantically, I propose that this would do something like the following:

  1. After the user's file is included, the module is closed just like it currently is.
  2. If the user provided a precompilation callback:
    1. We first make a deepcopy of the module, which is what will be used for serialization.
    2. Then, we run the user-provided callback, which will mutate state in the module and also trigger the compilations we want.
    3. Finally, we can now extract only the newly added method instances in the module's method tables, and move/copy them into the originally checkpointed module,
  3. and then we serialize that module.

This allows us to separate the concerns of defining a module and running a workload to snoop compile it.

It allows us to ensure that the snoop workload doesn't accidentally introduce state into the module that is serialized, causing unexpected behaviors.

It allows us to be able to robustly ignore "dangling tasks", which preserves the behavior that pre-1.9 users have with PackageCompiler.

And the implementation doesn't seem too burdensome, and is free unless users use the new feature.

Thoughts?

@NHDaly NHDaly added compiler:precompilation Precompilation of modules feature Indicates new feature / enhancement requests labels Oct 27, 2023
@NHDaly
Copy link
Member Author

NHDaly commented Oct 27, 2023

CC: @vchuravy
Also CC: @timholy

@vchuravy vchuravy changed the title Proposal: Close modules before precompilation for PkgImages Proposal: Run precompilation workloads at top-level instead of within package Oct 28, 2023
@NHDaly
Copy link
Member Author

NHDaly commented Nov 13, 2023

Here is one wrinkle in this proposal, though:

julia> m2 = deepcopy(M)
ERROR: deepcopy of Modules not supported

@NHDaly
Copy link
Member Author

NHDaly commented Nov 13, 2023

Maybe we can do the precompile step in a separate forked process, so that we can keep the original module clean, and then write out the precompiles from the other process? Or pipe them back over to the original process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests

1 participant