-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a GC safepoint in Task.wait #41441
add a GC safepoint in Task.wait #41441
Conversation
This gives GC a chance to run during a task switch, in addition to being triggered by allocations. Without this, non-allocating tasks may fail to stop for a long time, even if they're doing IO or calling `yield` explicitly, preventing us from stopping the world for GC.
@@ -818,6 +818,7 @@ end | |||
end | |||
|
|||
function wait() | |||
GC.safepoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right place to put this, but it prevents the shell from hanging when we pasted in the simple program in the linked issue.
Let me know if somewhere else in this function, or another function is better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an unconditional safepoint in process_events() after the yield, but I suppose if the code is not returning to an existing Task, we would not reach that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, good to know! I see it now, here:
Line 210 in 5f18285
jl_gc_safepoint_(ct->ptls); |
Was that added after Julia 1.6? We were experiencing some long pauses in prod on 1.6, which is what got us looking into this. But seems like it's already been addressed on master?
In any case, as you say, it seems like pausing for GC before task yielding would give slightly more opportunities for GC to run, but i agree that it doesn't seem much different.
The example problem we ran from #40972 hangs on Julia 1.6. We tried running it on master, and it seems like it doesnt hang at first, but then it locks my terminal after a few seconds, even to the point where it can't be interrupted with ctrl-z. After this PR, it seems to work correctly again. To add some more context, my understanding is that Since Task yield is a relatively expensive operation compared to gc_safepoint, this seems like a safe place to inject it. @vtjnash does that seem reasonable to you as well? :) Thanks! |
Interesting; I can reproduce the problem on 1.6 but not this behavior. |
This is very cheap though so might as well merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cheap though so might as well merge it.
Oh wow, thanks @JeffBezanson! Sorry for my lack of reply; i was out for the long holiday weekend. I'm leaving some comments inline.
We tried running it on master, and it seems like it doesnt hang at first, but then it locks my terminal after a few seconds, even to the point where it can't be interrupted with ctrl-z.
Interesting; I can reproduce the problem on 1.6 but not this behavior.
Yeah, weird, right? The program @vilterp posted in #40972 (comment) consistently locks my REPL after a few more commands, every time. I had julia built on the following commit: ed4c44f.
I just rebuilt at the commit right before this PR, 480ff81, and it doesn't hang there. So I don't think it was related to the changes in this PR.
But anyway, thanks for the merge!
@@ -818,6 +818,7 @@ end | |||
end | |||
|
|||
function wait() | |||
GC.safepoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, good to know! I see it now, here:
Line 210 in 5f18285
jl_gc_safepoint_(ct->ptls); |
Was that added after Julia 1.6? We were experiencing some long pauses in prod on 1.6, which is what got us looking into this. But seems like it's already been addressed on master?
In any case, as you say, it seems like pausing for GC before task yielding would give slightly more opportunities for GC to run, but i agree that it doesn't seem much different.
Also, unclear if this qualifies as a "bug fix", or whatever the criteria is for backport, but we've backported this to 1.6 in our production builds, and it seemed to apply cleanly, so we could consider |
Git blame suggests that the jl_uv.c:210 safepoint was added in the pull request that enabled task migration, i.e. after 1.6 branched but happily before 1.7 branched. Testing on 1.7-beta3 might be worthwhile? :) |
Good suggestion, @Sacha0. Cool, so: I just checked in So whatever it was that fixed it on master, has also fixed it on EDIT: And of course, as you pointed out, that gc_safepoint line is already in there in 1.7, so in neither beta2 nor beta3 did I experience the original hang reported in #40972. 👍 |
(cherry picked from commit 97f817a)
(cherry picked from commit 97f817a)
(cherry picked from commit 97f817a)
(cherry picked from commit 97f817a)
This comment has been minimized.
This comment has been minimized.
(cherry picked from commit 97f817a)
This gives GC a chance to run during a task switch,
in addition to being triggered by allocations.
Without this, non-allocating tasks may fail to stop for
a long time, even if they're doing IO or calling
yield
explicitly, preventing us from stopping the world for GC.
Fixes #40972 (I think?)
cc @NHDaly @dewilson @vchuravy