-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add macro for timing package and dependency imports #41612
Conversation
Great idea! Maybe print in seconds, so that it's clearer where the big offenders are? |
I don't think 1 second is the cut-off for being highlighted. Smaller load times can be problematic when there are lots of them. I chose ms and aligned the decimals so that it effectively forms a bar graph on the left. Hopefully that goes a fair way to make it easier to parse. |
Perhaps it should be |
Or just Would it be easy to add an indicator of the recursion level? Something like
? |
Great to have, I've wanted this for a long time, and I see the code to make it happen was surprisingly short. I assume the timing is correct (just better on 1.8?), I load Zygote (same/latest version) in 4.5 sec. 67% longer than you do. While it could be explained by load (2,73, Firefox running... while not a bad machine 32 GB 16 virtual core), that might not be the full story, as loading Flux is (consistently) only 38% slower, PNGFiles 41% slower, and CSV 60% slower. I'm on 1.7-beta3, different versions could also explain. |
I think what you're observing is isolated load time vs. interleaved load time
That's a key point. I'll add it to the docstring |
Given people can using/import, and that this works with code blocks, I wouldn't want to force users too narrowly.
Good idea, I've added this
|
Great idea!:+1: Could the output be sorted though (ideally within each nesting level)? |
Sorting would require a lot more invasive code, and I'm not sure it would even make sense as the order here is key for understanding the timings, given the |
It's maybe not well known, but I've experienced total loading time to differ by ordering of |
That's likely to be invalidation-related |
42e916c
to
ed5b4dd
Compare
base/exports.jl
Outdated
@@ -993,6 +993,7 @@ export | |||
@timev, | |||
@elapsed, | |||
@allocated, | |||
@loadtime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about @time_imports
?
This only measures timing for serialized code, right? Anything marked __precompile__(false)
will not be timed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's good! thanks
The nesting seems a bit off:
LLVMExtra_jll should be attributed to LLVM and not ChainRules. |
@vchuravy The nesting goes the other way. It's a little counterintuitive. That's why I added the tree branch indicators. But you're right that something is off.. Judging by this, LLVMExtra_jll should be a direct dep of NNLibCUDA along side LLVM, which they aren't I think it's to do with when the btw, making the tree branch characters a bit more aesthetic would be possible by threading a bit more information through the functions, but I thought it might be better to keep this as minimal as possible so that it has less overhead when switched off |
base/timing.jl
Outdated
package will accurately report a faster load time than if it were to be loaded in isolation. | ||
|
||
""" | ||
macro loadtime(ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about stupid people like me who write:
@loadtime begin
# lots of indirection
@loadtime begin
using ...
end
using A
end
Currently A
won't be counted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I guess instead of a bool it could be an Int that's incremented/decremented and checked for being > 0
Renamed to I'm realizing that the nesting of There may not be a way to reliably represent the dependency tree here without parsing the Manifest. Maybe that could be done in the presence of this macro though |
Can this be backported (to 1.7)? Since this will break no code, only used in the REPL. |
No. It will go in 1.8 |
I think it's going to be very hard to get the dependency tree nesting accurate, for the reason I mentioned above and given:
The indentation that this PR currently shows is most likely going to cause confusion, so I'm thinking about reverting to a flat list, unless someone sees an easy way to make it accurate? |
So right now the nesting represents the wavefront of parallelism? I would say that's still useful, but instead of a nested tree, maybe simply a list ordered by depth? |
The thing I was surprised by was that Lines 839 to 850 in 9044222
I was expecting it to only loop through imported direct dependencies, and the nesting would handle things from there downwards..
With the depth not being accurate adding sorting might even further obfuscate, and I think I like that this prints as soon as a dep is imported. It feels like peeking into the process. |
Codecov Report
@@ Coverage Diff @@
## master #41612 +/- ##
==========================================
- Coverage 86.15% 81.67% -4.48%
==========================================
Files 350 350
Lines 65991 73977 +7986
==========================================
+ Hits 56854 60424 +3570
- Misses 9137 13553 +4416
Continue to review full report at Codecov.
|
Given import time accumulates when nested imports happen it could also be confusing to not show the nesting.. |
Nice. Might seem like a small thing but is it possible to move the identifier somewhere else, e.g. |
b6f3ec0
to
1ef418f
Compare
Sounds good to me. I've moved it to InteractiveUtils and added a test |
This is RTM for me |
macro time_imports(ex) | ||
quote | ||
try | ||
Base.TIMING_IMPORTS[Threads.threadid()] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Julia having thread migration now, this kind of code is a bit fishy.
You can no longer guarantee that you receive the same value from threadid on the same task.
I would simplify this to one Atomic
in the system. (Which would also help with esoteric code doing using
on a different task).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that sounds good. Thanks. Updated
Adds
@loadtime
@time_imports
for reporting the import time of packages and all their dependencies (direct and indirect).Edit: Updated to show dep nest indenting
Note that it's real load time, not isolated, so if a package's deps have been loaded by an earlier package it will accurately report a faster time than if it were to be loaded in isolation.