-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jl_cumulative_compile_time_ns_before
tracks compilation time per thread rather than per task
#41271
Comments
Here's another, maybe a bit simpler example, which shows the same affect as above using (Note that this example doesn't work if you copy/paste into the REPL; i think because it's compiling the REPL # async-time-example.jl
const running = Threads.Atomic{Bool}(true)
loop() = while running[] ; yield(); GC.safepoint(); end
# Run once as warmup
running[] = true
t = @async loop()
sleep(0.1) ; running[] = false
wait(t)
# ----------------------------
running[] = true
# Start timing a task, which doesn't have any compilation time.
t = @async @time loop()
# Finish timing the task. Note no compilation time.
sleep(0.1) ; running[] = false
wait(t)
# ----------------------------
running[] = true
# Now start timing the *same* task, which shouldn't have any compilation time.
t = @async @time while running[] ; yield(); GC.safepoint(); end
# But start another task *on the same thread,* which DOES spend some compilation time.
t2 = @async (@eval f() = 2+2; @eval f())
# Finish timing the original task, and observe that the compilation time from `t2`
# was recorded by the `@time` in `t`.
sleep(0.1) ; running[] = false
wait(t) ; wait(t2) And here's the output: $ julia ~/Downloads/async-time-example.jl
0.079854 seconds (72 allocations: 4.594 KiB)
0.093527 seconds (2.27 k allocations: 158.235 KiB, 4.59% compilation time) The request here is to record compilation time via task-local storage, instead of through thread-local storage. |
Worth noting that actually this is not unique to compilation time. The number of allocations are tracked globally as well, as shown in this simple example: julia> function f(x::Array{Any}) x[1] + x[1] end
f (generic function with 1 method)
julia> t = @async @time sleep(10)
Task (runnable) @0x000000010b059000
julia> @time for _ in 1:3000000 f(Any[2]) end
0.204893 seconds (3.00 M allocations: 274.658 MiB, 16.42% gc time)
julia> 10.005931 seconds (3.04 M allocations: 277.261 MiB, 0.34% gc time, 0.19% compilation time) This would be great to change for all the metric in |
IIRC, the code is currently taking extra effort to make this counter global (instead of per-thread). The challenge may be deciding if we should ever "charge" thread costs (time, memory, etc.) to the parent Tasks (all that called 'wait' on it), and how to handle task thread-migration. |
Tracking this discussion. |
Thanks @vtjnash - that makes sense. Interesting! Yeah, after thinking more about it, i can see why the global accounting is actually really valuable for some situations. It seems like maybe we'd ideally want both types of metrics? In addition to the global stuff we currently report, it would be great to also be able to track:
Does that make sense to you? That we could (have the option to) record both types of metrics: task-local (and maybe including any child tasks that we wait on - as you say), and global? |
@vtjnash -
After looking at this a bit more, i don't think the And actually, @janrous-rai just pointed out to me that I think there's actually a race condition here, because if multiple Tasks scheduled on the same thread are both measuring Line 89 in 5dbf45a
To fix the race condition, do you think we could at least change Alternatively, is it really so expensive to measure the compilation time that we need to enable/disable it? Could we just keep this as a cumulative count of all compilation time, and remove the |
Now, multiple tasks (on the same or different Threads) can start and stop compilation time measurement, without interrupting each other. * Makes jl_cumulative_compile_time_ns into a global, atomic variable. Instead of keeping per-task compilation time, this change keeps a global counter of compilation time, protected with atomic mutations. Fixes JuliaLang#41739 ```julia julia> include("./compilation-task-migration-17-example.jl") start thread: 2 end thread: 2 5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time) julia> include("./compilation-task-migration-17-example.jl") start thread: 3 WARNING: replacing module M. end thread: 1 4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time) ``` Compilation time measurement originally added in: JuliaLang#38885 Problems addressed: - This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled. - It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)). - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean. - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
Now, multiple tasks (on the same or different Threads) can start and stop compilation time measurement, without interrupting each other. * Makes jl_cumulative_compile_time_ns into a global, atomic variable. Instead of keeping per-task compilation time, this change keeps a global counter of compilation time, protected with atomic mutations. Fixes JuliaLang#41739 ```julia julia> include("./compilation-task-migration-17-example.jl") start thread: 2 end thread: 2 5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time) julia> include("./compilation-task-migration-17-example.jl") start thread: 3 WARNING: replacing module M. end thread: 1 4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time) ``` Compilation time measurement originally added in: JuliaLang#38885 Problems addressed: - This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled. - It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)). - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean. - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
Now, multiple tasks (on the same or different Threads) can start and stop compilation time measurement, without interrupting each other. * Makes jl_cumulative_compile_time_ns into a global, atomic variable. Instead of keeping per-task compilation time, this change keeps a global counter of compilation time, protected with atomic mutations. Fixes #41739 ```julia julia> include("./compilation-task-migration-17-example.jl") start thread: 2 end thread: 2 5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time) julia> include("./compilation-task-migration-17-example.jl") start thread: 3 WARNING: replacing module M. end thread: 1 4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time) ``` Compilation time measurement originally added in: #38885 Problems addressed: - This fixes #41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled. - It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (#41271 (comment)). - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean. - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics). (cherry picked from commit b4ca196)
Now, multiple tasks (on the same or different Threads) can start and stop compilation time measurement, without interrupting each other. * Makes jl_cumulative_compile_time_ns into a global, atomic variable. Instead of keeping per-task compilation time, this change keeps a global counter of compilation time, protected with atomic mutations. Fixes JuliaLang#41739 ```julia julia> include("./compilation-task-migration-17-example.jl") start thread: 2 end thread: 2 5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time) julia> include("./compilation-task-migration-17-example.jl") start thread: 3 WARNING: replacing module M. end thread: 1 4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time) ``` Compilation time measurement originally added in: JuliaLang#38885 Problems addressed: - This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled. - It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)). - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean. - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
Now, multiple tasks (on the same or different Threads) can start and stop compilation time measurement, without interrupting each other. * Makes jl_cumulative_compile_time_ns into a global, atomic variable. Instead of keeping per-task compilation time, this change keeps a global counter of compilation time, protected with atomic mutations. Fixes JuliaLang#41739 ```julia julia> include("./compilation-task-migration-17-example.jl") start thread: 2 end thread: 2 5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time) julia> include("./compilation-task-migration-17-example.jl") start thread: 3 WARNING: replacing module M. end thread: 1 4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time) ``` Compilation time measurement originally added in: JuliaLang#38885 Problems addressed: - This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled. - It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)). - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean. - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
Now, multiple tasks (on the same or different Threads) can start and stop compilation time measurement, without interrupting each other. * Makes jl_cumulative_compile_time_ns into a global, atomic variable. Instead of keeping per-task compilation time, this change keeps a global counter of compilation time, protected with atomic mutations. Fixes JuliaLang#41739 ```julia julia> include("./compilation-task-migration-17-example.jl") start thread: 2 end thread: 2 5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time) julia> include("./compilation-task-migration-17-example.jl") start thread: 3 WARNING: replacing module M. end thread: 1 4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time) ``` Compilation time measurement originally added in: JuliaLang#38885 Problems addressed: - This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled. - It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)). - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean. - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
For my work at RelationalAI I'm trying to gather metrics on how much time is spent in run-time compilation while evaluating queries. I was hoping to use
jl_cumulative_compile_time_ns_[before/after]
for this, a little bit like this:This works fine if there are no background tasks and queries run sequentially. It also works fine in a multi-threaded scenario, as long as only a single task runs at-a-time at any given thread, because
jl_cumulative_compile_time_ns_[before/after]
seem to use separate counters per thread. However, it breaks when multiple tasks are multiplexed on the same thread. For instance, if one call ofevaluate_query
has a very long runningquery_fn
, and other tasks that do work that incur compilation time are running at the same time on that thread, that firstevaluate_query
call will also record all the compilation time triggered by those others tasks. This means that we're overestimating how much time we spent in compilation for that given query, potentially by a very large margin.To see this behavior in action, please consider the following MRE:
This outputs:
As you can see, the last call to
do_something_and_measure_compilation_time
recorded the compilation time of the background task. Unfortunately that behavior makes it unusable for the use-case described above. We'd therefore like to request thatjl_cumulative_compile_time_ns_[before/after]
is changed such that it keeps a counter per task, rather than per thread. Thanks!The text was updated successfully, but these errors were encountered: