Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminating overhead? #10

Open
MilesCranmer opened this issue May 28, 2023 · 11 comments
Open

Eliminating overhead? #10

MilesCranmer opened this issue May 28, 2023 · 11 comments

Comments

@MilesCranmer
Copy link

Very useful package, thanks for putting it up. I was wondering if there is a way to eliminate all overhead from lazy-loaded calls?

I am trying to conditionally load Zygote.jl in my package. Package extensions are not practical here because I don't want downstream users to have to load Zygote.jl. I want my package to install Zygote for them, but only load it when necessary: gradients are not commonly used, so it makes sense to have conditional loading.

But since I use Zygote.jl-generated gradients as kernels, the overhead of Base.invokelatest quickly accumulates. Is there any way I can eliminate this completely?

Since I don't expect the function to change after the first call of Base.invokelatest, perhaps there is a way to tell the compiler it is free to inline the latest call? I tried recording the world age manually, and then using Base.invoke_in_world, as well as Core._call_in_world but this sadly did not seem to improve things.

Perhaps this is not possible with Julia?

@johnnychen94
Copy link
Owner

I've thought about this, too. I also asked if my colleague @thautwarm has any ideas on this. But it seems that eliminating the overhead would be impossible as long as we allow Julia's dynamic compilation feature.

A possible alternative idea with Julia 1.9's extensions could be building some known package that triggers >10 extensions (exchanging the role of the main package and the extension package 😆). But yet I don't know if it's even a good idea...

@MilesCranmer
Copy link
Author

MilesCranmer commented May 29, 2023

I wonder if one could define an internal module that triggers extensions to load. Something like:

function load_ext()
    Base.MainInclude.eval(:(using PrimaryModule._ExtensionLoader))
end

And have the internal module _ExtensionLoader trigger an extension to load within PrimaryModule and overload the relevant functions with an eagerly loaded module.

But I’m not sure if this would work as it assumes the user has loaded PrimaryModule, rather than having it as an indirect dependency…


Edit: nevermind, I guess this package would already be loaded when the user loads PrimaryModule

@MilesCranmer
Copy link
Author

cc @mkitti in case you have ideas

@MilesCranmer
Copy link
Author

Wait, would something like the following work?

julia> module A
           using Requires: @init, @require
           function f()
               Base.require(@__MODULE__, :Zygote)
           end
           @init @require Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" using Zygote
       end

julia> A.Zygote
ERROR: UndefVarError: `Zygote` not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:31
 [2] top-level scope
   @ REPL[2]:1

julia> A.f()
Zygote

julia> A.Zygote.gradient
gradient (generic function with 1 method)

@MilesCranmer
Copy link
Author

MilesCranmer commented May 29, 2023

My god, I think it actually works. Here's a working example of lazily-loaded Zygote.jl with zero overhead on calls to Zygote.gradient: https://github.com/SymbolicML/DynamicExpressions.jl/blob/2e760980524e4424317bd9e194274e3e10381b3e/src/OperatorEnumConstruction.jl

Here are the relevant lines of the lazy loading part:

generate_diff_operators(::Any, ::Any) = error("`Zygote` not loaded.")
@init @require Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" @eval begin
    include("zygote_interface.jl")
end

function OperatorEnum(; binary_operators, unary_operators, enable_autodiff=false)
    if enable_autodiff
        Base.require(@__MODULE__, :Zygote)
        Base.invokelatest(generate_diff_operators, binary_operators, unary_operators)
    end
end

Then, the contents of zygote_interface.jl:

import Zygote: gradient

function generate_diff_operators(
    binary_operators::Vector{Function}, unary_operators::Vector{Function}
)
    diff_bin = Function[]
    diff_una = Function[]

    for op in binary_operators
        diff_op(x, y) = gradient(op, x, y)
        push!(diff_bin, diff_op)
    end
    for op in unary_operators
        diff_op(x) = gradient(op, x)[1]
        push!(diff_una, diff_op)
    end
    return diff_bin, diff_una
end

We can see that Zygote.jl is not actually loaded at startup:

julia> @time_imports using DynamicExpressions
      1.2 ms  SuiteSparse
      3.4 ms  ArrayInterfaceCore
      1.0 ms  IfElse
     33.8 ms  Static
      3.2 ms  ArrayInterface
      5.7 ms  StaticArrayInterface
      1.3 ms  SIMDTypes
      2.5 ms  ManualMemory
      4.8 ms  LayoutPointers
      2.3 ms  CPUSummary
      1.3 ms  BitTwiddlingConvenienceFunctions
      9.1 ms  HostCPUFeatures
    184.0 ms  VectorizationBase
      3.8 ms  SLEEFPirates
      1.2 ms  UnPack
      1.0 ms  Adapt
     38.3 ms  OffsetArrays
      1.7 ms  StaticArrayInterface  StaticArrayInterfaceOffsetArraysExt
      7.7 ms  ThreadingUtilities
      7.2 ms  PolyesterWeave
      2.2 ms  DocStringExtensions
      5.7 ms  CloseOpenIntervals
    136.6 ms  LoopVectorization
     10.0 ms  MacroTools
    127.3 ms  DynamicExpressions 4.44% compilation time

generate_diff_operators still needs to be called with Base.invokelatest, but the actual expensive calls (Zygote.gradient) seem to be zero-overhead.

@MilesCranmer
Copy link
Author

Ah, damn. It seems to get world age issues when I run it inside AirspeedVelocity.jl (which wraps inside a module). So I guess this doesn't fix it.

@johnnychen94
Copy link
Owner

I tend to believe black-box function is the right appropriate abstraction here...: if you don't treat it as a black box you'll get surprised somewhere..

@MilesCranmer
Copy link
Author

The weird thing is that this seems to work in most contexts. Even if I wrap it in a module manually and try executing it from the REPL; it still works, and appears to have zero overhead. It’s only when I run within AirspeedVelocity.jl do I get an error.

Perhaps it’s something to do with how AirspeedVelocity.jl imports the module twice: once at the top level, and once within a module for benchmarking…

@mkitti
Copy link

mkitti commented May 29, 2023

Maybe a macro might be helpful here. You could use the macro to load the package just before calling the function.

@MilesCranmer
Copy link
Author

An internal macro or user-facing?

@MilesCranmer
Copy link
Author

I’m wondering if there is a way one could simply manually trigger an extension to load. It doesn’t seem like it would require LLVM hacking; the extensions seem to be organized by some Julia code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants