Add macro for timing package and dependency imports #41612

IanButterworth · 2021-07-16T03:33:19Z

Adds ~~@loadtime~~ @time_imports for reporting the import time of packages and all their dependencies (direct and indirect).

Edit: Updated to show dep nest indenting

julia> @time_imports using CSV
      0.6 ms    ┌ IteratorInterfaceExtensions
    115.4 ms  ┌ TableTraits
    365.8 ms  ┌ SentinelArrays
    102.0 ms  ┌ Parsers
      0.7 ms  ┌ DataValueInterfaces
      1.7 ms    ┌ DataAPI
     34.0 ms  ┌ WeakRefStrings
     16.7 ms  ┌ Tables
     28.0 ms  ┌ PooledArrays
    698.7 ms  CSV

julia> @time_imports using Flux
     58.9 ms    ┌ MacroTools
     61.8 ms  ┌ ZygoteRules
      3.0 ms    ┌ Zlib_jll
      7.4 ms  ┌ ZipFile
    649.3 ms    ┌ StaticArrays
    655.9 ms  ┌ DiffResults
      3.0 ms  ┌ Compat
    106.8 ms  ┌ DataStructures
      0.9 ms  ┌ NaNMath
      1.1 ms  ┌ Requires
      3.4 ms    ┌ TranscodingStreams
      8.5 ms  ┌ CodecZlib
     83.6 ms  ┌ FixedPointNumbers
     65.0 ms    ┌ ChainRulesCore
     92.4 ms  ┌ ChainRules
     11.5 ms    ┌ AbstractFFTs
      1.0 ms        ┌ Preferences
      4.0 ms      ┌ JLLWrappers
    290.5 ms    ┌ LLVMExtra_jll
      0.7 ms    ┌ Reexport
     25.9 ms      ┌ RandomNumbers
     35.6 ms    ┌ Random123
      1.0 ms      ┌ Adapt
   1083.4 ms    ┌ GPUArrays
      4.1 ms    ┌ DocStringExtensions
      5.2 ms      ┌ CEnum
    148.9 ms    ┌ LLVM
      0.8 ms    ┌ ExprTools
     52.3 ms    ┌ TimerOutputs
     16.8 ms    ┌ CompilerSupportLibraries_jll
      9.1 ms    ┌ BFloat16s
      4.0 ms      ┌ LogExpFunctions
      2.6 ms      ┌ OpenSpecFun_jll
     28.7 ms    ┌ SpecialFunctions
    242.4 ms    ┌ GPUCompiler
    153.6 ms    ┌ NNlib
    828.2 ms    ┌ CUDA
   3017.7 ms  ┌ NNlibCUDA
    176.0 ms  ┌ FillArrays
      5.3 ms  ┌ Media
      0.7 ms    ┌ CommonSubexpressions
      1.0 ms    ┌ DiffRules
    117.1 ms  ┌ ForwardDiff
     15.4 ms  ┌ AbstractTrees
      0.9 ms  ┌ Functors
     34.5 ms  ┌ IRTools
   3026.2 ms  ┌ Zygote
    292.5 ms  ┌ ColorTypes
    221.1 ms  ┌ Colors
     42.4 ms  ┌ Juno
   8241.7 ms  Flux

julia> @time_imports begin
       using BenchmarkTools
       using PNGFiles
       end
      5.4 ms  ┌ JSON
     16.1 ms  BenchmarkTools
     76.9 ms      ┌ OffsetArrays
     88.6 ms    ┌ PaddedViews
      6.5 ms      ┌ StackViews
      6.0 ms      ┌ MappedArrays
     20.4 ms    ┌ MosaicViews
      3.1 ms    ┌ Graphics
      1.7 ms    ┌ TensorCore
    180.4 ms    ┌ ColorVectorSpace
    920.3 ms  ┌ ImageCore
      2.0 ms  ┌ libpng_jll
      4.2 ms  ┌ IndirectArrays
   1099.6 ms  PNGFiles

Note that it's real load time, not isolated, so if a package's deps have been loaded by an earlier package it will accurately report a faster time than if it were to be loaded in isolation.

antoine-levitt · 2021-07-16T05:46:46Z

Great idea! Maybe print in seconds, so that it's clearer where the big offenders are?

IanButterworth · 2021-07-16T11:27:22Z

I don't think 1 second is the cut-off for being highlighted. Smaller load times can be problematic when there are lots of them.

I chose ms and aligned the decimals so that it effectively forms a bar graph on the left. Hopefully that goes a fair way to make it easier to parse.

IanButterworth · 2021-07-16T13:49:42Z

Perhaps it should be @timeloading, which might make a bit more grammatical sense, and would sit beside @time, @timev, @timed

antoine-levitt · 2021-07-16T13:54:08Z

Or just @time_using pkg since it's only really supposed to be used like that?

Would it be easy to add an indicator of the recursion level? Something like

X ms >> Pkg3
X ms > Pkg2
X ms  Pkg3

?

base/loading.jl

PallHaraldsson · 2021-07-16T14:27:07Z

Great to have, I've wanted this for a long time, and I see the code to make it happen was surprisingly short.

I assume the timing is correct (just better on 1.8?), I load Zygote (same/latest version) in 4.5 sec. 67% longer than you do. While it could be explained by load (2,73, Firefox running... while not a bad machine 32 GB 16 virtual core), that might not be the full story, as loading Flux is (consistently) only 38% slower, PNGFiles 41% slower, and CSV 60% slower.

I'm on 1.7-beta3, different versions could also explain.

IanButterworth · 2021-07-16T14:28:57Z

I think what you're observing is isolated load time vs. interleaved load time

Note that it's real load time, not isolated, so if a package's deps have been loaded by an earlier package it will accurately report a faster time than if it were to be loaded in isolation.

That's a key point. I'll add it to the docstring

IanButterworth · 2021-07-16T14:36:54Z

Or just @time_using pkg since it's only really supposed to be used like that?

Given people can using/import, and that this works with code blocks, I wouldn't want to force users too narrowly.

Would it be easy to add an indicator of the recursion level?

Good idea, I've added this

julia> @loadtime using Flux
     50.0 ms      MacroTools
    340.0 ms    ZygoteRules
      2.6 ms      Zlib_jll
    120.1 ms    ZipFile
    598.6 ms      StaticArrays
    604.8 ms    DiffResults
      2.5 ms    Compat
     80.5 ms    DataStructures
      0.8 ms    NaNMath
      0.9 ms    Requires
      2.9 ms      TranscodingStreams
      7.4 ms    CodecZlib
     59.0 ms    FixedPointNumbers
     58.7 ms      ChainRulesCore
     76.4 ms    ChainRules
      9.4 ms      AbstractFFTs
      0.8 ms          Preferences
      3.5 ms        JLLWrappers
     14.5 ms      LLVMExtra_jll
      0.6 ms      Reexport
     25.3 ms        RandomNumbers
     34.5 ms      Random123
      0.8 ms        Adapt
   1061.5 ms      GPUArrays
      4.2 ms      DocStringExtensions
      5.0 ms        CEnum
    139.0 ms      LLVM
      0.8 ms      ExprTools
     53.8 ms      TimerOutputs
     16.7 ms      CompilerSupportLibraries_jll
      6.1 ms      BFloat16s
      3.8 ms        LogExpFunctions
      2.3 ms        OpenSpecFun_jll
     26.5 ms      SpecialFunctions
    206.1 ms      GPUCompiler
     29.5 ms      NNlib
    897.0 ms      CUDA
   2597.4 ms    NNlibCUDA
    164.2 ms    FillArrays
      5.1 ms    Media
      0.8 ms      CommonSubexpressions
      1.3 ms      DiffRules
    115.2 ms    ForwardDiff
     14.8 ms    AbstractTrees
      1.0 ms    Functors
     33.1 ms    IRTools
   2669.6 ms    Zygote
    256.1 ms    ColorTypes
    177.5 ms    Colors
     64.1 ms    Juno
   7807.4 ms  Flux

Drvi · 2021-07-16T14:52:36Z

Great idea!:+1: Could the output be sorted though (ideally within each nesting level)?

IanButterworth · 2021-07-16T14:54:43Z

Sorting would require a lot more invasive code, and I'm not sure it would even make sense as the order here is key for understanding the timings, given the !!! note

PallHaraldsson · 2021-07-16T16:46:21Z

It's maybe not well known, but I've experienced total loading time to differ by ordering of using of dependencies, so that might be an argument to not order, rather keep that order.

IanButterworth · 2021-07-16T17:38:26Z

I've experienced total loading time to differ by ordering of using of dependencies

That's likely to be invalidation-related

IanButterworth · 2021-07-18T05:40:41Z

Now with tree branch indicators to highlight the direction of the dependencies

timholy · 2021-07-18T08:46:28Z

base/exports.jl

@@ -993,6 +993,7 @@ export
    @timev,
    @elapsed,
    @allocated,
+    @loadtime,


What about @time_imports?

This only measures timing for serialized code, right? Anything marked __precompile__(false) will not be timed.

Yeah that's good! thanks

vchuravy · 2021-07-18T09:32:25Z

The nesting seems a bit off:

     76.4 ms    ChainRules
      9.4 ms      AbstractFFTs
      0.8 ms          Preferences
      3.5 ms        JLLWrappers
     14.5 ms      LLVMExtra_jll
      0.6 ms      Reexport
     25.3 ms        RandomNumbers
     34.5 ms      Random123
      0.8 ms        Adapt
   1061.5 ms      GPUArrays
      4.2 ms      DocStringExtensions
      5.0 ms        CEnum
    139.0 ms      LLVM

LLVMExtra_jll should be attributed to LLVM and not ChainRules.

IanButterworth · 2021-07-18T11:32:02Z

@vchuravy The nesting goes the other way. It's a little counterintuitive. That's why I added the tree branch indicators.

But you're right that something is off.. Judging by this, LLVMExtra_jll should be a direct dep of NNLibCUDA along side LLVM, which they aren't

I think it's to do with when the depth value is incremented

btw, making the tree branch characters a bit more aesthetic would be possible by threading a bit more information through the functions, but I thought it might be better to keep this as minimal as possible so that it has less overhead when switched off

vchuravy · 2021-07-18T11:37:44Z

base/timing.jl

+    package will accurately report a faster load time than if it were to be loaded in isolation.
+
+"""
+macro loadtime(ex)


what about stupid people like me who write:

@loadtime begin # lots of indirection @loadtime begin using ... end using A end

Currently A won't be counted

True. I guess instead of a bool it could be an Int that's incremented/decremented and checked for being > 0

IanButterworth · 2021-07-18T13:42:54Z

Renamed to @time_imports.

I'm realizing that the nesting of _tryrequire_from_serialized isn't a reliable representation of the dependency tree because each package tries to load all deps, not just direct, and the load order isn't in top-down order, it just tries them in some sequence. The nesting only happens when a dep hits a dep that's not been seen before.

There may not be a way to reliably represent the dependency tree here without parsing the Manifest. Maybe that could be done in the presence of this macro though

PallHaraldsson · 2021-07-18T13:45:04Z

Can this be backported (to 1.7)? Since this will break no code, only used in the REPL.

oscardssmith · 2021-07-18T14:35:21Z

No. It will go in 1.8

IanButterworth · 2021-07-18T20:58:56Z

I think it's going to be very hard to get the dependency tree nesting accurate, for the reason I mentioned above and given:

I tried reordering the dependencies at each level to put project deps first, which would make the current indenting approach work, but it broke precompilation, and even if it worked would deviate the package load ordering making the timing in the test non-representative (i.e. package load order affecting invalidations?).
I thought about parsing the env manifest and attributing each a depth, but deps are used in multiple places in the dep tree, and AFAICT it's not possible to link the order that deps are loaded with the manifest deps map.

The indentation that this PR currently shows is most likely going to cause confusion, so I'm thinking about reverting to a flat list, unless someone sees an easy way to make it accurate?

vchuravy · 2021-07-18T21:02:07Z

So right now the nesting represents the wavefront of parallelism? I would say that's still useful, but instead of a nested tree, maybe simply a list ordered by depth?

IanButterworth · 2021-07-18T21:19:33Z

The thing I was surprised by was that staledeps below (which is a confusingly named variable as it's either a list of deps, or a bool stating there are stale deps) isn't just the deps that are using/import-ed by the parent, but a list of all direct and indirect deps.

julia/base/loading.jl

Lines 839 to 850 in 9044222

    
           for i in 1:length(staledeps) 
        
               dep = staledeps[i] 
        
               dep isa Module && continue 
        
               modpath, modkey, build_id = dep::Tuple{String, PkgId, UInt64} 
        
               dep = _tryrequire_from_serialized(modkey, build_id, modpath) 
        
               if dep === nothing 
        
                   @debug "Required dependency $modkey failed to load from cache file for $modpath." 
        
                   staledeps = true 
        
                   break 
        
               end 
        
               staledeps[i] = dep::Module 
        
           end

I was expecting it to only loop through imported direct dependencies, and the nesting would handle things from there downwards..

maybe simply a list ordered by depth

With the depth not being accurate adding sorting might even further obfuscate, and I think I like that this prints as soon as a dep is imported. It feels like peeking into the process.

codecov · 2021-07-19T09:47:28Z

Codecov Report

Merging #41612 (9044222) into master (004cb25) will decrease coverage by 4.47%.
The diff coverage is 80.18%.

❗ Current head 9044222 differs from pull request most recent head 6cd8d97. Consider uploading reports for the commit 6cd8d97 to get more accurate results

@@            Coverage Diff             @@
##           master   #41612      +/-   ##
==========================================
- Coverage   86.15%   81.67%   -4.48%     
==========================================
  Files         350      350              
  Lines       65991    73977    +7986     
==========================================
+ Hits        56854    60424    +3570     
- Misses       9137    13553    +4416

Impacted Files	Coverage Δ
base/atomics.jl	`84.00% <ø> (+0.66%)`	⬆️
base/baseext.jl	`94.73% <ø> (ø)`
base/compiler/typeinfer.jl	`76.61% <ø> (-14.25%)`	⬇️
base/compiler/typelattice.jl	`85.16% <ø> (-5.92%)`	⬇️
base/compiler/typelimits.jl	`81.86% <ø> (-11.07%)`	⬇️
base/compiler/types.jl	`65.51% <ø> (-4.49%)`	⬇️
base/compiler/typeutils.jl	`65.66% <ø> (-16.70%)`	⬇️
base/compiler/utilities.jl	`86.48% <ø> (-0.19%)`	⬇️
base/compiler/validation.jl	`88.99% <ø> (+0.64%)`	⬆️
base/complex.jl	`99.43% <ø> (-0.16%)`	⬇️
... and 514 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 586acfb...6cd8d97. Read the comment docs.

IanButterworth · 2021-07-19T14:01:59Z

Given import time accumulates when nested imports happen it could also be confusing to not show the nesting..
I added a note to the docstring that hopefully means the nesting can stay.

JeffBezanson · 2021-07-19T15:55:04Z

Nice. Might seem like a small thing but is it possible to move the identifier somewhere else, e.g. InteractiveUtils? I'm pretty concerned about bell-and-whistle creep in Base.

IanButterworth · 2021-07-20T02:02:45Z

Sounds good to me. I've moved it to InteractiveUtils and added a test

IanButterworth · 2021-07-20T22:52:23Z

This is RTM for me

vchuravy · 2021-07-21T07:01:23Z

stdlib/InteractiveUtils/src/macros.jl

+macro time_imports(ex)
+    quote
+        try
+            Base.TIMING_IMPORTS[Threads.threadid()] += 1


With Julia having thread migration now, this kind of code is a bit fishy.
You can no longer guarantee that you receive the same value from threadid on the same task.

I would simplify this to one Atomic in the system. (Which would also help with esoteric code doing using on a different task).

Ok, that sounds good. Thanks. Updated

stdlib/InteractiveUtils/src/macros.jl

antoine-levitt mentioned this pull request Jul 16, 2021

Latency JuliaMolSim/DFTK.jl#402

Closed

giordano reviewed Jul 16, 2021

View reviewed changes

base/loading.jl Outdated Show resolved Hide resolved

IanButterworth force-pushed the ib/loadtime branch from 42e916c to ed5b4dd Compare July 18, 2021 05:39

timholy reviewed Jul 18, 2021

View reviewed changes

vchuravy reviewed Jul 18, 2021

View reviewed changes

IanButterworth changed the title ~~Add loadtime macro for timing package and dependency loading~~ Add macro for timing package and dependency imports Jul 19, 2021

add macro for timing package and dep imports

95feb76

add to NEWS

1ef418f

IanButterworth force-pushed the ib/loadtime branch from b6f3ec0 to 1ef418f Compare July 19, 2021 19:32

add a test

4aa9b53

rename var for clarity

a24fb7e

vchuravy reviewed Jul 21, 2021

View reviewed changes

use a Threads.Atomic counter instead

6cd8d97

JeffBezanson merged commit 69f8bce into JuliaLang:master Jul 21, 2021

IanButterworth deleted the ib/loadtime branch July 21, 2021 16:54

vchuravy reviewed Jul 21, 2021

View reviewed changes

stdlib/InteractiveUtils/src/macros.jl Show resolved Hide resolved

vchuravy mentioned this pull request Jul 30, 2021

Make jl_cumulative_compile_time_ns global (and reentrant). #41733

Merged

NHDaly mentioned this pull request Jul 30, 2021

Compilation Time reporting is broken on Julia 1.7, due to Task Migration #41739

Closed

ranocha mentioned this pull request Feb 28, 2022

Possibly interesting features of Julia v1.8 trixi-framework/Trixi.jl#1075

Open

hyrodium mentioned this pull request May 7, 2022

Add exact rules for irrational numbers JuliaMath/IrrationalConstants.jl#14

Open

oschulz mentioned this pull request May 21, 2022

ArrayInterfaceCore JuliaArrays/ArrayInterface.jl#211

Closed

tfiers mentioned this pull request Jan 12, 2023

Measure load times; display on graph tfiers/PkgGraph.jl#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add macro for timing package and dependency imports #41612

Add macro for timing package and dependency imports #41612

IanButterworth commented Jul 16, 2021 •

edited

Loading

antoine-levitt commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

antoine-levitt commented Jul 16, 2021

PallHaraldsson commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

Drvi commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

PallHaraldsson commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

IanButterworth commented Jul 18, 2021

timholy Jul 18, 2021

IanButterworth Jul 18, 2021

vchuravy commented Jul 18, 2021 •

edited

Loading

IanButterworth commented Jul 18, 2021 •

edited

Loading

vchuravy Jul 18, 2021

IanButterworth Jul 18, 2021

IanButterworth commented Jul 18, 2021

PallHaraldsson commented Jul 18, 2021

oscardssmith commented Jul 18, 2021

IanButterworth commented Jul 18, 2021

vchuravy commented Jul 18, 2021

IanButterworth commented Jul 18, 2021

codecov bot commented Jul 19, 2021 •

edited

Loading

IanButterworth commented Jul 19, 2021

JeffBezanson commented Jul 19, 2021

IanButterworth commented Jul 20, 2021

IanButterworth commented Jul 20, 2021

vchuravy Jul 21, 2021

IanButterworth Jul 21, 2021

Add macro for timing package and dependency imports #41612

Add macro for timing package and dependency imports #41612

Conversation

IanButterworth commented Jul 16, 2021 • edited Loading

antoine-levitt commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

antoine-levitt commented Jul 16, 2021

PallHaraldsson commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

Drvi commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

PallHaraldsson commented Jul 16, 2021

IanButterworth commented Jul 16, 2021

IanButterworth commented Jul 18, 2021

timholy Jul 18, 2021

Choose a reason for hiding this comment

IanButterworth Jul 18, 2021

Choose a reason for hiding this comment

vchuravy commented Jul 18, 2021 • edited Loading

IanButterworth commented Jul 18, 2021 • edited Loading

vchuravy Jul 18, 2021

Choose a reason for hiding this comment

IanButterworth Jul 18, 2021

Choose a reason for hiding this comment

IanButterworth commented Jul 18, 2021

PallHaraldsson commented Jul 18, 2021

oscardssmith commented Jul 18, 2021

IanButterworth commented Jul 18, 2021

vchuravy commented Jul 18, 2021

IanButterworth commented Jul 18, 2021

codecov bot commented Jul 19, 2021 • edited Loading

Codecov Report

IanButterworth commented Jul 19, 2021

JeffBezanson commented Jul 19, 2021

IanButterworth commented Jul 20, 2021

IanButterworth commented Jul 20, 2021

vchuravy Jul 21, 2021

Choose a reason for hiding this comment

IanButterworth Jul 21, 2021

Choose a reason for hiding this comment

IanButterworth commented Jul 16, 2021 •

edited

Loading

vchuravy commented Jul 18, 2021 •

edited

Loading

IanButterworth commented Jul 18, 2021 •

edited

Loading

codecov bot commented Jul 19, 2021 •

edited

Loading