Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation Time reporting is broken on Julia 1.7, due to Task Migration #41739

Closed
NHDaly opened this issue Jul 30, 2021 · 8 comments · Fixed by #41733
Closed

Compilation Time reporting is broken on Julia 1.7, due to Task Migration #41739

NHDaly opened this issue Jul 30, 2021 · 8 comments · Fixed by #41733
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior performance Must go faster
Milestone

Comments

@NHDaly
Copy link
Member

NHDaly commented Jul 30, 2021

Consider this small example, which @times within a Task, allows the Task to migrate threads, and then shows that the results are garbage when it switches threads:

# compilation-task-migration-17-example.jl
@sync begin
    # Start a long task that yields a lot, and is likely to switch tasks
    # Run a `@time` measurement in there, to report the compilation time.
    Threads.@spawn begin
        @time begin
            Core.println("start thread: ",Threads.threadid())
            for _ in 1:5
                sleep(0.2)  # might switch threads
            end
            @eval module M ; f(x,y) = x+y; end
            @eval M.f(1,2)
            Core.println("end thread: ",Threads.threadid())
        end
    end
    # Start some heavy work to occupy the thread that the above task started on.
    @sync for _ in 1:Threads.nthreads()
        Threads.@spawn for _ in 1:10 peakflops() end
    end
end

One time it reported 0% compilation time, the next time it reports 4327426979.58% compilation time 😅

julia> versioninfo()
Julia Version 1.7.0-beta3
Commit e76c9dad42 (2021-07-07 08:12 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.0 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
  
julia> include("compilation-task-migration-17-example.jl")
start thread: 1
WARNING: replacing module M.
end thread: 3
  4.352289 seconds (18.97 k allocations: 2.391 GiB, 6.06% gc time)

julia> include("compilation-task-migration-17-example.jl")
start thread: 2
WARNING: replacing module M.
end thread: 3
  4.262751 seconds (18.97 k allocations: 2.391 GiB, 5.35% gc time, 4327426979.58% compilation time)

(For completeness, here's the same test on julia 1.6):

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

julia> include("compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.200354 seconds (3.24 M allocations: 2.563 GiB, 6.75% gc time, 0.33% compilation time)

julia> include("compilation-task-migration-17-example.jl")
start thread: 2
WARNING: replacing module M.
end thread: 2
  5.840806 seconds (18.27 k allocations: 2.391 GiB, 8.02% gc time, 0.05% compilation time)
@NHDaly NHDaly added bug Indicates an unexpected problem or unintended behavior performance Must go faster labels Jul 30, 2021
@NHDaly
Copy link
Member Author

NHDaly commented Jul 30, 2021

So, it seems to me that:

  1. Per-thread measurements of compilation time are just not very useful, at least certainly aren't what a user wants when they call @time.
  2. It's hard to get useful per-task measurements, since you have the problem that anything that happens in child tasks is not attributed to you (as we discussed yesterday).
  3. Since all of the other metrics that we currently report (allocations, bytes, time, gc time) are all global measurements, that we should just make this a global metric as well.

Does that seem like a reasonable approach to everyone? Remove the per-thread aspect of this measurement, make it global, and protect it with atomics, similar to what was done in #41612?

(I'll probably make that change by adapting #41733 to this purpose, and then use atomic increment/decrement for the feature, to make it reentrant.)

Sound good?
CC: @IanButterworth, @vtjnash, @vchuravy
CC: @rgankema, @ElSeidy, since this is related to #41271.

@vchuravy
Copy link
Member

vchuravy commented Jul 31, 2021 via email

@rgankema
Copy link

rgankema commented Jul 31, 2021

  1. Since all of the other metrics that we currently report (allocations, bytes, time, gc time) are all global measurements, that we should just make this a global metric as well.

I guess this makes sense, but for the specific performance measurements I'd like to do, global allocation and compilation time measurements are not very useful. Can I hope for additional task-level measurements at some point in time, or would that be unrealistic?

@ElSeidy
Copy link

ElSeidy commented Jul 31, 2021

  1. Since all of the other metrics that we currently report (allocations, bytes, time, gc time) are all global measurements, that we should just make this a global metric as well.

I guess this makes sense, but for the specific performance measurements I'd like to do, global allocation and compilation time measurements are not very useful. Can I hope for additional task-level measurements at some point in time, or would that be unrealistic?

I would second @rgankema in that too. I am also interested in per-task metrics (alongside with global metrics). I can imagine lots of profiling applications (other than mine) that would prefer to rely on this level of abstraction.

I hope thats feasible too :)

@NHDaly
Copy link
Member Author

NHDaly commented Jul 31, 2021

The trouble is that, like I said in 2., it's not super clear how well-defined per-task measurements are..

Like, do you really only want the compilation time in that single task? Or do you want compilation time in that task and all tasks that it spawned? Or that task + all the tasks that it waited on? Or only the tasks that it waited on, that were spawned by it?

I think the main thing blocking support for this feature is just deciding as a group what we want the semantics to mean. I think it should be okay to implement it, we just need to decide what we want to measure.

That said, ++ i think we all want to have some task-accounting for things like this, yeah. 👍 👍

NHDaly added a commit to NHDaly/julia that referenced this issue Jul 31, 2021
Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("/Users/nathandaly/Downloads/compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("/Users/nathandaly/Downloads/compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```
NHDaly added a commit to NHDaly/julia that referenced this issue Jul 31, 2021
Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("/Users/nathandaly/Downloads/compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("/Users/nathandaly/Downloads/compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```
@ElSeidy
Copy link

ElSeidy commented Jul 31, 2021

The trouble is that, like I said in 2., it's not super clear how well-defined per-task measurements are..

Like, do you really only want the compilation time in that single task? Or do you want compilation time in that task and all tasks that it spawned? Or that task + all the tasks that it waited on? Or only the tasks that it waited on, that were spawned by it?

I think the main thing blocking support for this feature is just deciding as a group what we want the semantics to mean. I think it should be okay to implement it, we just need to decide what we want to measure.

That said, ++ i think we all want to have some task-accounting for things like this, yeah. 👍 👍

@NHDaly , thats a good point.

In my opinion, the way I think about it is similar to execution time (total/breakdowns). I would define it as the total time for compilation taken for a task to finish (including any subtasks spawned by it and it waited on). I think, its up to the developer/instrumenter to be able to extract the any breakdowns (just like we do in the tracers' execution time).

@ElSeidy
Copy link

ElSeidy commented Jul 31, 2021

@NHDaly, to be accurate, I am also assuming here that I can compute the compilation time for a portion within a single task too like @time (to be able to extract the breakdowns):
something like this (excuse my pseudocode)

# task 1
@compilation_time begin. # block1
     #some computations for block1 (in task1)
     ....
     @compilation_time begin # block2
             # some other computation for block2 (in task1)
             .....
     end
     
     # task 2
     @sync @spawn begin 
          @compilation_time begin # block3
             # some other computation for block3 (in task2)
             ....
          end
     end
end

here the compilation time for block1 would include both block2 and block3. But I can still retrieve the compilation times for block2 and block3 independently too.

NHDaly pushed a commit to NHDaly/julia that referenced this issue Jul 31, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

Add `cumulative_compile_time_ns()` to return the cumulative compile time for a thread without stopping measurement.

```julia
julia> Base.cumulative_compile_time_ns_before()  # enable constantly measuring compilation time
0x0000000000000000

julia> @time @eval module M ; f(x) = 2+x; end
  0.006730 seconds (919 allocations: 55.212 KiB, 57.20% compilation time)
Main.M

julia> Base.cumulative_compile_time_ns()
0x00000000075246b3

julia> @time 2+2
  0.000000 seconds
4

julia> Base.cumulative_compile_time_ns()
0x0000000007fe4a46

julia> @time @eval M.f(2)
  0.003538 seconds (750 allocations: 46.247 KiB, 94.64% compilation time)
4

julia> Base.cumulative_compile_time_ns()
0x000000000831619e
```

Make jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```
@NHDaly NHDaly self-assigned this Aug 1, 2021
@rgankema
Copy link

rgankema commented Aug 1, 2021

@NHDaly That's a great point, I should have been more precise :) I'd be interested in compilation time for that task and all tasks it spawned. Of course, if that's feasible at all!

To express that more high-level: when I run a piece of code, I want to know all the compilation costs that were incurred as a result of running that code. So that means that if that code spawns new tasks, those tasks are interesting as well. But other tasks that just happen to be running at the same time (which I guess is what we'd measure with global measurements), are tasks I don't want to measure.

NHDaly added a commit to NHDaly/julia that referenced this issue Aug 1, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

Add `cumulative_compile_time_ns()` to return the cumulative compile time for a thread without stopping measurement.

```julia
julia> Base.cumulative_compile_time_ns_before()  # enable constantly measuring compilation time
0x0000000000000000

julia> @time @eval module M ; f(x) = 2+x; end
  0.006730 seconds (919 allocations: 55.212 KiB, 57.20% compilation time)
Main.M

julia> Base.cumulative_compile_time_ns()
0x00000000075246b3

julia> @time 2+2
  0.000000 seconds
4

julia> Base.cumulative_compile_time_ns()
0x0000000007fe4a46

julia> @time @eval M.f(2)
  0.003538 seconds (750 allocations: 46.247 KiB, 94.64% compilation time)
4

julia> Base.cumulative_compile_time_ns()
0x000000000831619e
```

Make jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```
@vchuravy vchuravy added this to the 1.7 milestone Aug 13, 2021
NHDaly added a commit to NHDaly/julia that referenced this issue Aug 18, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

Add `cumulative_compile_time_ns()` to return the cumulative compile time for a thread without stopping measurement.

```julia
julia> Base.cumulative_compile_time_ns_before()  # enable constantly measuring compilation time
0x0000000000000000

julia> @time @eval module M ; f(x) = 2+x; end
  0.006730 seconds (919 allocations: 55.212 KiB, 57.20% compilation time)
Main.M

julia> Base.cumulative_compile_time_ns()
0x00000000075246b3

julia> @time 2+2
  0.000000 seconds
4

julia> Base.cumulative_compile_time_ns()
0x0000000007fe4a46

julia> @time @eval M.f(2)
  0.003538 seconds (750 allocations: 46.247 KiB, 94.64% compilation time)
4

julia> Base.cumulative_compile_time_ns()
0x000000000831619e
```

Make jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```
NHDaly added a commit to NHDaly/julia that referenced this issue Aug 18, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: JuliaLang#38885

Problems addressed:
- This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
NHDaly added a commit to NHDaly/julia that referenced this issue Aug 23, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: JuliaLang#38885

Problems addressed:
- This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
KristofferC pushed a commit that referenced this issue Aug 27, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes #41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: #38885

Problems addressed:
- This fixes #41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).

(cherry picked from commit b4ca196)
janrous-rai pushed a commit to janrous-rai/julia that referenced this issue Oct 28, 2021
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: JuliaLang#38885

Problems addressed:
- This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Feb 22, 2022
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: JuliaLang#38885

Problems addressed:
- This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Mar 8, 2022
Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: JuliaLang#38885

Problems addressed:
- This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants