-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using output from --trace-compile
from running a workload can sometimes cache more than running the workload itself
#48048
Comments
The example in Ferrite-FEM/Tensors.jl#190 is a little bit more complicated and in this case, specializion on module Ts
struct T{dim}
data::NTuple{dim, Float64}
end
@generated function h(::Type{T{dim}}, f::F) where {dim, F}
[:(f($i)) for i=1:dim]
:(T((f(0), f(0))))
end
@inline f(el::Number, ::Type{T{dim}}) where {dim} = h(T{dim}, i -> el)
using SnoopPrecompile
@precompile_all_calls f(1.0, T{2})
end |
This appears to be fixed in https://github.com/timholy/SnoopCompile.jl/tree/teh/snoop_pc (the julia> using SnoopCompileCore
julia> using Tensors
julia> using SnoopPrecompile
julia> tinf = @time_precompiled @snoopi_deep begin
@precompile_all_calls begin
for dim in (2, 3)
v = ones(Tensor{1, dim, Float64,})
σ = one(SymmetricTensor{2, dim, Float64})
F = one(Tensor{2, dim, Float64,})
E = one(SymmetricTensor{4, dim, Float64})
C = one(Tensor{4, dim, Float64,})
v * 1.0
v ⋅ v
v ⊗ v
σ ⋅ v
F ⋅ v
σ ⊗ σ
F ⊗ F
σ * 1.0
F * 1.0
σ ⊡ σ
F ⊡ F
E ⊡ σ
C ⊡ F
E * 1.0
C * 1.0
# TODO: AD?
end
end
end;
julia> using SnoopCompile
julia> tinf
InferenceTimingNode: 0.013773/0.159047 on Core.Compiler.Timings.ROOT() with 1 direct children
julia> tinf.children[1].children
14-element Vector{SnoopCompileCore.InferenceTimingNode}:
InferenceTimingNode: 0.000637/0.033348 on *([1.0, 1.0]::Vec{2, Float64}, 1.0::Float64) with 2 direct children
InferenceTimingNode: 0.006718/0.008715 on Tensors.otimes([1.0, 1.0]::Vec{2, Float64}, [1.0, 1.0]::Vec{2, Float64}) with 4 direct children
InferenceTimingNode: 0.000077/0.031955 on LinearAlgebra.dot([1.0 0.0; 0.0 1.0]::SymmetricTensor{2, 2, Float64, 3}, [1.0, 1.0]::Vec{2, Float64}) with 1 direct children
InferenceTimingNode: 0.000865/0.003543 on Tensors.otimes([1.0 0.0; 0.0 1.0]::SymmetricTensor{2, 2, Float64, 3}, [1.0 0.0; 0.0 1.0]::SymmetricTensor{2, 2, Float64, 3}) with 6 direct children
InferenceTimingNode: 0.001037/0.005002 on Tensors.otimes([1.0 0.0; 0.0 1.0]::Tensor{2, 2, Float64, 4}, [1.0 0.0; 0.0 1.0]::Tensor{2, 2, Float64, 4}) with 7 direct children
InferenceTimingNode: 0.000514/0.000649 on *([1.0 0.0; 0.0 1.0]::SymmetricTensor{2, 2, Float64, 3}, 1.0::Float64) with 1 direct children
InferenceTimingNode: 0.000372/0.000497 on *([1.0 0.0; 0.0 1.0]::Tensor{2, 2, Float64, 4}, 1.0::Float64) with 1 direct children
InferenceTimingNode: 0.000165/0.016104 on Tensors.dcontract([1.0 0.0; 0.0 1.0]::SymmetricTensor{2, 2, Float64, 3}, [1.0 0.0; 0.0 1.0]::SymmetricTensor{2, 2, Float64, 3}) with 1 direct children
InferenceTimingNode: 0.000068/0.004072 on Tensors.dcontract([1.0 0.0; 0.0 1.0]::Tensor{2, 2, Float64, 4}, [1.0 0.0; 0.0 1.0]::Tensor{2, 2, Float64, 4}) with 1 direct children
InferenceTimingNode: 0.000303/0.006795 on Tensors.dcontract([1.0 0.0; 0.0 0.0;;; 0.0 0.0; 1.0 0.0;;;; 0.0 1.0; 0.0 0.0;;; 0.0 0.0; 0.0 1.0]::Tensor{4, 2, Float64, 16}, [1.0 0.0; 0.0 1.0]::Tensor{2, 2, Float64, 4}) with 4 direct children
InferenceTimingNode: 0.000459/0.004824 on *([1.0 0.0; 0.0 0.0;;; 0.0 0.5; 0.5 0.0;;;; 0.0 0.5; 0.5 0.0;;; 0.0 0.0; 0.0 1.0]::SymmetricTensor{4, 2, Float64, 9}, 1.0::Float64) with 2 direct children
InferenceTimingNode: 0.000560/0.018592 on *([1.0 0.0; 0.0 0.0;;; 0.0 0.0; 1.0 0.0;;;; 0.0 1.0; 0.0 0.0;;; 0.0 0.0; 0.0 1.0]::Tensor{4, 2, Float64, 16}, 1.0::Float64) with 2 direct children
InferenceTimingNode: 0.000060/0.000081 on iterate((2, 3)::Tuple{Int64, Int64}, 2::Int64) with 1 direct children
InferenceTimingNode: 0.000170/0.000225 on iterate((2, 3)::Tuple{Int64, Int64}, ::Int64) with 1 direct children |
I added the following to SnoopPrecompile: tim@diva:~/.julia/dev/SnoopCompile/SnoopPrecompile$ git diff src/SnoopPrecompile.jl
diff --git a/SnoopPrecompile/src/SnoopPrecompile.jl b/SnoopPrecompile/src/SnoopPrecompile.jl
index d0dcd6b..59f4c41 100644
--- a/SnoopPrecompile/src/SnoopPrecompile.jl
+++ b/SnoopPrecompile/src/SnoopPrecompile.jl
@@ -1,5 +1,7 @@
module SnoopPrecompile
+using MethodAnalysis
+
export @precompile_all_calls, @precompile_setup
const verbose = Ref(false) # if true, prints all the precompiles
@@ -11,7 +13,7 @@ function precompile_roots(roots)
for child in roots
mi = child.mi_info.mi
precompile(mi.specTypes) # TODO: Julia should allow one to pass `mi` directly (would handle `invoke` properly)
- verbose[] && println(mi)
+ verbose[] && printstyled(mi, '\n'; color=:cyan)
end
end
@@ -74,6 +76,11 @@ macro precompile_all_calls(ex::Expr)
return esc(quote
if ccall(:jl_generating_output, Cint, ()) == 1 || $SnoopPrecompile.verbose[]
$ex
+ if $SnoopPrecompile.verbose[]
+ for mi in $methodinstances_owned_by(@__MODULE__)
+ printstyled(mi, '\n'; color=:green)
+ end
+ end
end
end)
end and modified the precompile section of using SnoopPrecompile
SnoopPrecompile.verbose[] = true
@precompile_all_calls f(T{[2][1]}) and got the following output (screenshot to show color): Furthermore: tim@diva:/tmp/pkgimage/Ts$ ~/src/juliaw/julia --project -q
julia> using PkgCacheInspector
julia> cf = info_cachefile("Ts");
julia> using MethodAnalysis
julia> methodinstances_owned_by(only(cf.modules))
3-element Vector{Core.MethodInstance}:
MethodInstance for Ts.f(::Type{Ts.T{2}})
MethodInstance for Ts.h(::Type{Ts.T{2}}, ::Function)
MethodInstance for Ts.h(::Type{Ts.T{2}}, ::Ts.var"#1#2") So it's being cached. Finally: tim@diva:/tmp/pkgimage/Ts$ ~/src/juliaw/julia --project -q
julia> using Ts
julia> using MethodAnalysis
julia> methodinstances_owned_by(Ts)
3-element Vector{Core.MethodInstance}:
MethodInstance for Ts.f(::Type{Ts.T{2}})
MethodInstance for Ts.h(::Type{Ts.T{2}}, ::Function)
MethodInstance for Ts.h(::Type{Ts.T{2}}, ::Ts.var"#1#2") So it's also being reloaded. The issue is the abstract method (not the concrete one) gets re-inferred despite its presence in the cache. CC @aviatesk, xref #47960. |
My follow up question is why it seems to work when the explicit |
This is a summarized MWE version of the discussion in Ferrite-FEM/Tensors.jl#190.
The following code creates a small package:
One way to cache compiled code is by using
--trace-compile=stderr
and put what comes out into the package file. So we run the following code (the[2][1]
stuff is to trick constant propagation):The
typeof(Main.test)
is not interesting but if we put the two others precompile signatures into the package and rerun the workload we see that the methods inTs
are cached and no longer printedAn alternativ (better) way is to just run this workload in the package itself. SnoopPrecompile has a macro that makes Julia try really hard to compile the stuff in it. So the two precompile signatures are replaced with:
where again the
[2][1]
is to try avoid any constant propagation. Running the workload again:we see that in this case, the method
Ts.h
has failed to cache and gets recompiled (the constructorTs.T
managed to cache though). So putting the precompile signature into the package made it cache, but running the workload that emits the precompile signature made it not cache.There are a few things we can do to fix the discrepancy:
@inline
onf
.f
inh
withwhere {F}
.But ideally, running a workload that outputs certain precompile signatures in
--trace-compile
should be equally good as putting those precompile signatures explicitly into the package.The text was updated successfully, but these errors were encountered: