-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refetch PTLS via Task struct (enable task migration) #40715
Conversation
cb5a359
to
ab08a29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting!
julia> function f()
a = Threads.threadid()
yield()
b = Threads.threadid()
(a, b)
end
f (generic function with 1 method)
julia> function waste(d)
t0 = time_ns()
while time_ns() - t0 < d
end
end
waste (generic function with 1 method)
julia> begin
tasks = [Threads.@spawn(f()) for _ in 1:10000]
Threads.@threads for _ in 1:Threads.nthreads()
waste(100_000_000)
end
end
julia> ids = fetch.(tasks);
julia> d = map(Base.splat(==), ids);
julia> countmap(d)
Dict{Bool, Int64} with 2 entries:
0 => 8757
1 => 1243
I looked at the patch and it made sense to me. I'm not familiar with a large part of it, though.
@@ -298,12 +298,12 @@ JL_DLLEXPORT jl_array_t *jl_reshape_array(jl_value_t *atype, jl_array_t *data, | |||
|
|||
JL_DLLEXPORT jl_array_t *jl_string_to_array(jl_value_t *str) | |||
{ | |||
jl_ptls_t ptls = jl_get_ptls_states(); | |||
jl_task_t *ct = jl_current_task; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid some of these changes, just to reduce code churn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are pretty much all necessary. I think it would be bad to analyze them on a case-by-case basis as they cost performance and correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I realized later we also don't want anybody reading the code to get the wrong idea about how to do it.
This looks great to me so far. Let's try benchmarks for funsies. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG |
That seems to be just noise |
Of note, this is possibly breaking linuxaarch64, as I think it is known that unwind info registration is bad there, and it seems this is somehow then making it worse. |
Here is master:
PR:
So something seems to be a bit slower in pure task spawning/switching. |
I can also observe a longer tail in spawn overheads in a different benchmark. OTOH, I'm somewhat surprised that this PR increases the overhead on spawn. I thought that the overhead might happen (also?) at re- Spawn and sync (wait) overheadsBenchmark code: https://github.com/tkf/ThreadsAPIBenchmarks.jl/blob/4e6b8795334a45fd3ffa687a70bb68efec0c4839/benchmark/task_overheads.jl#L4-L18 It uses
This branch ab08a29:1.7.0-DEV.1089 (d6c092d):Overhead of notify (schedule)Benchmark code: https://github.com/tkf/ThreadsAPIBenchmarks.jl/blob/4e6b8795334a45fd3ffa687a70bb68efec0c4839/benchmark/notify_overheads.jl#L4-L37 This branch ab08a29:1.7.0-DEV.1089 (d6c092d):You can find the notebooks containing a bit more explanation of the benchmarks here:
The code running the benchmarks and generating the notebooks is in https://github.com/tkf/ThreadsAPIBenchmarks.jl |
@JeffBezanson I don't know how to reproduce that without |
It's just this:
I agree it's a lot of scheduler overhead, which we'll probably have to address separately. Just wanted to get a sense of what effects we might see from this. I don't think it blocks merging. |
f16b6e8
to
3d08b65
Compare
Co-authored-by: Takafumi Arakaki <aka.tkf@gmail.com>
3d08b65
to
5f34035
Compare
…Lang#40715) Enables task-thread migration! Co-authored-by: Takafumi Arakaki <aka.tkf@gmail.com>
Should the Caveat in the manual be removed: https://docs.julialang.org/en/v1/manual/multi-threading/#Caveats?
|
I wasn't sure where to open a discussion on this, but my post on Discourse (https://discourse.julialang.org/t/task-scheduling-semantics/62950) didn't garner any attention, so perhaps here is the better place. @vtjnash, @JeffBezanson As much as I'm excited for task migration,I think there are some open design question to answer before pushing this functionality into a release. My core concern is that there are two kinds of scheduler stickiness you might want, either to stick on the system thread, or to stick on the thread as your parent task is (at any moment) scheduled on. At present these are the same, however with task migration they will no longer be. The way I see it, you want to pin your task to system threads when you need access to thread local state or you're interacting with libraries that need to run on particular threads. For the more general Julia async use cases however, what you want is to guarantee non-simultaneous execution of tasks in some group (e.g., parent and sibling async tasks). It would be nice to consider this requirement prior to making the scheduler more flexible, and potentially also cleaning up the task scheduling API. I may be wrong, but I believe the current |
…Lang#40715) Enables task-thread migration! Co-authored-by: Takafumi Arakaki <aka.tkf@gmail.com>
This PR seems to have broken precompilation for GPUCompiler.jl. Before, doing this:
using LinearAlgebra
m = 10
using PackageCompiler
default_compile_dir() = joinpath(homedir(), ".julia", "sysimages")
default_compile_filename() = "sys_gputest.so"
default_compile_path() = joinpath(default_compile_dir(), default_compile_filename())
function compile(;
dir::AbstractString=default_compile_dir(),
filename::AbstractString=default_compile_filename(),
)
if !isdir(dir)
println("""The directory "$dir" doesn't exist yet, creating it now.""")
println()
mkdir(dir)
end
path = joinpath(dir, filename)
println(
"""Creating the system image "$path" containing the compiled version of GPUCompiler. This may take a few minutes.""",
)
create_sysimage(
:GPUCompiler;
sysimage_path=path,
precompile_execution_file=joinpath(@__DIR__, "precomp_script.jl"),
)
return path
end
compile() and running with
(thanks @maleadt for this log)
Tim suggested that:
@vtjnash any thoughts? |
Implemented in #43496 |
Replaces #39220 @tkf @vchuravy @yuyichao