Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation markers for logging inference? #47902

Open
timholy opened this issue Dec 14, 2022 · 4 comments
Open

Instrumentation markers for logging inference? #47902

timholy opened this issue Dec 14, 2022 · 4 comments
Labels
compiler:inference Type inference compiler:latency Compiler latency julep Julia Enhancement Proposal

Comments

@timholy
Copy link
Sponsor Member

timholy commented Dec 14, 2022

TL;DR

The drive to minimize TTFX may benefit from the ability to send "instrumenting commands" to inference directly from Julia code.

For those who may remember a discussion (somehow I'm not finding it...) about the order in which compilation and timer instrumentation occur in @time, this has the same flavor.

Background

With native code caching (#47184), SnoopPrecompile, and the ability to fix invalidations, in principle we should often be able to drive "excess" TTFX (that beyond the workload itself) nearly to zero. So far, we're only rarely so successful: for example, #47889 seems to eliminate all invalidations that affect the demonstrated workloads, and yet the TTFX is more than a second for each. (It's milliseconds or less on the second execution of the workload.) While one can be happy about the gains we have, I was initially puzzled about why it's not at least an order of magnitude better.

Analysis of causes

From what I can tell, the answer appears to be a subtle interaction between the interpreter, scope, and tracking of inference. Here's the basic idea: currently SnoopPrecompile attempts to take a precompilation directive like this (a lightly modified version of CSV.jl's actual precompile file):

using SnoopPrecompile

@precompile_setup begin
    # Putting some things in `setup` can reduce the size of the
    # precompile file and potentially make loading faster.
    PRECOMPILE_DATA = "int,float,date,datetime,bool,null,str,catg,int_float\n1,3.14,2019-01-01,2019-01-01T01:02:03,true,,hey,abc,2\n2,NaN,2019-01-02,2019-01-03T01:02:03,false,,there,abc,3.14\n"
    @precompile_all_calls begin
        # all calls in this block will be precompiled, regardless of whether
        # they belong to your package or not (on Julia 1.8 and higher)
        CSV.File(IOBuffer(PRECOMPILE_DATA))
        CSV.File(Vector{UInt8}(PRECOMPILE_DATA))
        CSV.File(joinpath(pkgdir(CSV), "test", "testfiles", "precompile.csv"))
    end
end

and turn it into something like this:

if ccall(:jl_generating_output, Cint, ()) == 1       # run this only if precompiling
    let
        # USER SETUP CODE START
        PRECOMPILE_DATA = "int,float,date,datetime,bool,null,str,catg,int_float\n1,3.14,2019-01-01,2019-01-01T01:02:03,true,,hey,abc,2\n2,NaN,2019-01-02,2019-01-03T01:02:03,false,,there,abc,3.14\n"
        # USER SETUP CODE STOP
        Core.Compiler.Timings.reset_timings()       # this + next line start tracking everything that gets inferred
        Core.Compiler.__set_measure_typeinf(true)
        try
            # USER WORKLOAD START
            Base.Experimental.@force_compile        # this block *must* be passed to inference, the interpreter is not allowed to run it
            CSV.File(IOBuffer(PRECOMPILE_DATA))
            CSV.File(Vector{UInt8}(PRECOMPILE_DATA))
            CSV.File(joinpath(pkgdir(CSV), "test", "testfiles", "precompile.csv"))
            # USER WORKLOAD STOP
        finally
            Core.Compiler.__set_measure_typeinf(false)     # stop the instrumentation
            Core.Compiler.Timings.close_current_timer()
        end
        SnoopPrecompile.precompile_roots(Core.Compiler.Timings._timings[1].children)   # collect the list of things we inferred
    end
end

Seems sensible, right? But the problem is this: the @force_compile seems to end up forcing compilation of the entire contents of the let block, and that means some of the items get inferred before we turn on inference logging. That's OK for methods owned by the package, but if not it results in omission from the pkgimage. For this example, one of ~16 things that don't get cached is pkgdir(::Module); of course you can argue that I should have put that in the setup code, and that its omission is actually a good thing (I wouldn't disagree). But the more important point is that @precompile_all_calls is supposed to guarantee caching of everything inside, and it's definitely not living up to that promise; in some cases that hurts TTFX in undesirable ways.

Possible solutions

Separate toplevel expressions

If we didn't have to worry about setup code and hiding temporary variables inside the let, we could just have top-level expressions like

start_logging()
try
    @force_compile
    # user workload here
finally
    stop_logging()
end

But I think the ability to not pollute the module namespace and to have the option of not force-caching setup code is pretty desirable.

Use of @eval

Perhaps the next easiest is to turn the user workload into

    @eval begin
            Base.Experimental.@force_compile        # this block *must* be passed to inference, the interpreter is not allowed to run it
            CSV.File(IOBuffer($PRECOMPILE_DATA))
            CSV.File(Vector{UInt8}($PRECOMPILE_DATA))
            CSV.File(joinpath(pkgdir(CSV), "test", "testfiles", "precompile.csv"))
      end

but the need to escape the PRECOMPILE_DATA is annoying (presumably, package devs would have to do that manually, and it seems error-prone).

Instrumenting inference directly from user code

A final alternative is for us to have some special expressions, kind of like :meta works now, that essentially implements this logic within inference itself: that is, we basically send an expression to inference that toggles logging. In other words, it might translate into something like

using SnoopPrecompile

if ccall(:jl_generating_output, Cint, ()) == 1
    let
        PRECOMPILE_DATA = "int,float,date,datetime,bool,null,str,catg,int_float\n1,3.14,2019-01-01,2019-01-01T01:02:03,true,,hey,abc,2\n2,NaN,2019-01-02,2019-01-03T01:02:03,false,,there,abc,3.14\n"
        $(Expr(:meta, :track_inference, quote # this implies @force_compile for the block inside, but *not* for the containing `let` as a whole
            CSV.File(IOBuffer(PRECOMPILE_DATA))
            CSV.File(Vector{UInt8}(PRECOMPILE_DATA))
            CSV.File(joinpath(pkgdir(CSV), "test", "testfiles", "precompile.csv"))
        end)
        SnoopPrecompile.precompile_roots(Core.Compiler.Timings._timings[1].children)
    end
end

and Expr(:meta, :track_inference, expr_workload) is just a marker that gets processed during inference and triggers it to start logging before inferring expr_workload.

@timholy timholy added compiler:inference Type inference compiler:latency Compiler latency julep Julia Enhancement Proposal labels Dec 14, 2022
@Keno
Copy link
Member

Keno commented Dec 15, 2022

Use of @eval

Use a closure with an invokelatest barrier? That seems to work around the primary scoping drawback here.

@timholy
Copy link
Sponsor Member Author

timholy commented Dec 15, 2022

Great suggestion, thanks!

@timholy
Copy link
Sponsor Member Author

timholy commented Dec 15, 2022

Hmm, turns out it had one issue: JuliaRegistries/General#74176 (can't define new methods or types inside the thunk).

@timholy
Copy link
Sponsor Member Author

timholy commented Dec 15, 2022

It occurs that there's one more thing that favors the @eval solution: anyone who wants to precompile code executed by macros. Of course that only affects precompilation times, not TTFX, but as long as reducing the first doesn't hinder the second, it seems like a worthy goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference compiler:latency Compiler latency julep Julia Enhancement Proposal
Projects
None yet
Development

No branches or pull requests

2 participants