-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some refactoring of thread suspension. #34666
Conversation
Tagging @ViktorHofer as an area owner |
549433a
to
8ea755e
Compare
d4bb0dc
to
6b81346
Compare
@VSadov are you planning to do anything with this? |
I did this partially for my own education - why suspension is so complex for what it is doing. There was indeed some room to make things simpler and I made some changes. I think it could be simplified a bit more. Let me dust it off and measure again if there are gains. |
2b537d5
to
ce94005
Compare
The test program: using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;
class Program
{
static object[] arr1 = new Exception[10000];
enum E1
{
a
}
static object box = E1.a;
static void DoSomeWork()
{
var a1 = arr1;
ArgumentNullException val = new ArgumentNullException();
for (; ; )
{
Assign(a1, 1, val);
Assign(a1, 2, val);
Assign(a1, 3, val);
Assign(a1, 4, val);
Assign(a1, 5, val);
Assign(a1, 6, val);
Assign(a1, 7, val);
}
}
static bool doPoll;
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Assign(object[] a, int i, ArgumentNullException val)
{
if (val != null)
{
a[i] = val;
}
if (doPoll)
{
// enum->int unbox will perform a GC-polling FCALL (easy case for suspension)
int unboxed = (int)box;
}
a[i] = null;
}
static void Main()
{
for (int i = 0; i < Environment.ProcessorCount-1; i++)
{
new Thread(() =>
{
DoSomeWork();
}).Start();
}
Thread.Sleep(10);
for (; ; )
{
long longest = 0;
int iterations = 0;
doPoll ^= true;
// do 5sec of suspension/resume
var sw = Stopwatch.StartNew();
while (sw.ElapsedMilliseconds < 5000)
{
long oneIter = sw.ElapsedMilliseconds;
// Suspend/Resume EE
GC.GetTotalAllocatedBytes(precise:true);
oneIter = sw.ElapsedMilliseconds - oneIter;
longest = Math.Max(longest, oneIter);
iterations++;
// Let other threads actually resume
Thread.Yield();
}
sw.Stop();
{
Console.WriteLine($"frequent GC-polling : {doPoll}");
Console.WriteLine($"#suspensions in 5 sec. : {iterations}");
Console.WriteLine($"Average suspension took : {(double)sw.ElapsedMilliseconds / iterations}");
Console.WriteLine($"longest suspend was (msec): {longest}");
Console.WriteLine();
}
}
}
} |
Regular desktop (i7-8700, 12 logical cores, Windows 10) Numbers have some variations, especially "longest suspend", but these are typical numbers: === baseline (5.0) frequent GC-polling : False frequent GC-polling : True frequent GC-polling : False frequent GC-polling : True === after the change frequent GC-polling : False frequent GC-polling : True frequent GC-polling : False frequent GC-polling : True |
Azure Standard_F16s_v2 VM (16 virtual cores, Ubuntu Server 16.04 LTS) === baseline (5.0) frequent GC-polling : False frequent GC-polling : True === after the change frequent GC-polling : False frequent GC-polling : True |
Server (AMD Rome, 256 logical cores, 64 per NUMA node, Windows Server) === baseline (5.0) frequent GC-polling : False frequent GC-polling : True === after the change frequent GC-polling : False frequent GC-polling : True |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change, but it needs very careful scrutiny. I fully agree with David's comment.
@@ -4016,9 +4016,6 @@ BasicBlock* Compiler::fgCreateGCPoll(GCPollType pollType, BasicBlock* block) | |||
value = gtNewIndOfIconHandleNode(TYP_INT, (size_t)addrTrap, GTF_ICON_GLOBAL_PTR, false); | |||
} | |||
|
|||
// Treat the reading of g_TrapReturningThreads as volatile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be marked as non-CSEable at least? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-CSEable
would be good to have. I thought this is injected by JIT after optimizations.
In c++ we use compiler-only fences to make sure the read gets emitted in program order (the only gurantee that we need on the read side).
Some equivalent would be sufficient in JIT code as well.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/coreclr/src/vm/threadsuspend.cpp
Outdated
@@ -6107,14 +5869,10 @@ void ThreadSuspend::SuspendEE(SUSPEND_REASON reason) | |||
// I wonder how much confusion this has caused....) | |||
// It seems like much of the above is redundant. We should investigate reducing the number | |||
// of mechanisms we use to indicate that a suspension is in progress. | |||
// | |||
// +1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintentional change? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was agreeing with the comment, but decided not to deal with that in this change and forgot to remove.
#Resolved
src/coreclr/src/vm/threadsuspend.cpp
Outdated
{ | ||
if (thread->HasThreadState(Thread::TS_GCSuspendPending)) | ||
for (Thread *thread = ThreadStore::GetThreadList(NULL); thread != NULL; thread = ThreadStore::GetThreadList(thread)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the while
loop pattern in all places that iterate over threadstore today. I do not see value in changing to for loop in some of the places. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted back this and 2 other places to be:
Thread* thread = NULL;
while ((thread = ThreadStore::GetThreadList(thread)) != NULL)
{
}
In reply to: 503476859 [](ancestors = 503476859)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for
syntax could be a bit more robust for loops like this, since it scopes the iteration variable (thread
here) to the loop body. It makes it a bit easier to track who changes it and who uses it. It is a mild preference though.
In reply to: 505819489 [](ancestors = 505819489,503476859)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made it a while loop #Resolved
// | ||
// also unblock any thread waiting around for this thread to suspend. This prevents us from completely | ||
// starving other suspension clients, such as the debugger, which we otherwise would do because of | ||
// the priority we just established. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are non-obvious interactions between thread suspend and debugger. It am not sure whether this is safe to delete.
Either way, it may be a good idea to run debugger tests on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a single place where we wait on this event and it has a very short timeout. It timeouts often and waiter should be able to handle timeouts. Signaling the event, on the other hand, is an indication that there was some progress (a thread has suspended) and could be more confusing, since here there was no progress made.
Signaling probably made more sense in the past when suspension ping value was much longer.
I completely agree with the "debugger test" part. Should run some stress tests too.
Yes, I will schedule an in-person meeting. Some volatile removes (like locals that are not shared with anybody), may be easy to review. |
src/coreclr/src/vm/threadsuspend.cpp
Outdated
// return the current thread here, because we NULL it out after obtaining the thread store lock. </REVISIT> | ||
// If we have just updated hijacks/redirects, then do a pass while only observing. | ||
// Repeat observing only as long as we see progress. Most threads react to hijack/redirect very fast and | ||
// typically we can avoid waiting on an event. (except on uniporocessor where we do not spin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read it like 10 times - where? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, unipOrocessor
:-) #Resolved
@@ -404,26 +389,18 @@ Thread::SuspendThreadResult Thread::SuspendThread(BOOL fOneTryOnly, DWORD *pdwSu | |||
} | |||
#endif // _DEBUG | |||
|
|||
if (doSwitchToThread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is simpler here primarily because fibers stuff was removed. For example without fibers doSwitchToThread
would be always TRUE
so it is no longer needed, and so on...
src/coreclr/src/vm/threadsuspend.cpp
Outdated
@@ -2054,6 +2004,7 @@ void ThreadSuspend::LockThreadStore(ThreadSuspend::SUSPEND_REASON reason) | |||
// runtime threads for a GC that depends on the fact that we | |||
// do an EnablePreemptiveGC and a DisablePreemptiveGC around | |||
// taking this lock. | |||
// besides, if the suspension is in progress, the lock is taken, we better be in premptive mode while blocked on this lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
premptive [](start = 89, length = 9)
premptive
typo #Resolved
…int to update hijacks again right away.
…and added comment explaining why.
Regarding trying this change with debugger tests - That was done and revealed a few regressions in stress scenarios, which were related to incomplete removal of After that was fixed, debugger tests do not indicate any regressions. |
// Must let the profiler know that this thread is aborting its attempt at suspending | ||
{ | ||
BEGIN_PIN_PROFILER(CORProfilerTrackSuspends()); | ||
g_profControlBlock.pProfInterface->RuntimeSuspendAborted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this new location of the suspend abort, it doesn't appear that it protects against a deadlock on the threadstore lock. Could you explain the reasoning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think suspension-yielding behavior was done to guarantee that GC suspension will preempt other suspension, because it did not do that.
The thread that requests GC suspension is always in preemptive mode and would be ignored by the thread doing suspension. GC suspender can signal about its intentions by setting a global flag, but that could easily be too late for the current suspending thread to notice.
I believe the purpose is a mitigation for situations where we have a lot of non-GC suspensions. I am not sure how common/real such scenarios are, but it could potentially starve GC so we have this suspension-yielding behavior.
The other part of this mechanism is an event (s_hAbortEvt
) that non-GC suspenders could be forced to wait on while GC suspension is trying to get TSL. That is also racey - when we reset the event we may already have non-GC suspenders waiting on the TSL and not affected by the event.
Basically - this mechanism clearly does not guarantee that any non-GC suspension will immediately yield to GC suspension. At best it can guarantee that if we have 100 non-GC suspenders, then an incoming GC suspension will gain higher priority and get ahead most of them.
For such purpose checking once before starting the suspension loop is just as good as checking at each iteration.
I.E. we may allow one non-GC suspension that is already half-way or almost done, but will deprioritize others and this should still serve the starvation-prevention purpose.
Doing the check at this point has a number of benefits though:
- suspending EE is a relatively heavy operation. It makes sense to not abort suspension that is nearly done. (and we can't abort one that is completely done anyways).
- it reduces the number of states we have to deal with when restarting EE after an aborted suspension - we would not have, for example threads that are already redirected oh hijacked and asynchronously moving towards suspension as we decide to resume EE, or suspend again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed comment from the previous implementation discusses a deadlock, where the GC thread attempts to get the threadstore lock, but a separate suspended thread already holds the lock. Effectively, I think I'm asking, can that deadlock occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this code had a complicated history and some comments no longer match what code does.
I assume the question is about:
// When the debugger is synchronizing, trying to perform a GC could deadlock. The GC has the
// threadstore lock and synchronization cannot complete until the debugger can get the
// threadstore lock. However the GC can not complete until it sends the BeforeGarbageCollection
// event, and the event can not be sent until the debugger is synchronized. In order to break
// this deadlock cycle the GC must give up the threadstore lock, allow the debugger to synchronize,
// then try again.
I do not think the scenario described in the comment exists.
GC does not send BeforeGarbageCollection
event. At least I could not find where it does. Perhaps it did in the past.
Regardless, debugger should not be synchronizing when GC holds the lock.
I have added an assert, just in case, but I think it is not possible for a GC suspension to be happening while debugger is IsSynchronizing
.
Right now switching into IsSynchronizing
mode happens when debugger already owns the TSL. here:
runtime/src/coreclr/src/debug/ee/debugger.cpp
Line 5503 in 4486164
m_trappingRuntimeThreads = TRUE; |
It is done via an ordinary write but it is done while holding TSL so that is ok. All places that handle this flag appear to hold the TSL as well, except when it is reset - not sure if we hold the TSL at that point, but contracts do not require that.
There is only one case when GC could yield the suspension - when debugger has threads in inconsistent state - like stopped on a breakpoint. Then the thread may appear stopped on a safe point, but actually it is not.
This is the only case where everyone, including GC must retry suspending - since suspension did not really succeed.
We still handle this scenario.
The state is indicated by g_pDebugInterface->ThreadsAtUnsafePlaces()
, which is an interlock-updated counter. We read the value after EE is suspended and thus we read a stable value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, debugger should not be synchronizing when GC holds the lock.
If above is actually not true, we should check for that before doing suspension loop - same place where we check for GC vs. non-GC suspension collisions. At that point we hold TSL, so the value will not change, but we have not done much work yet.
I do not think it is a case that we need to handle though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine a context switch delaying the GC thread by XX time right before setting m_pThreadAttemptingSuspendForGC
.
Since GC thread is preemptive, the other thread could start, continue, finish the suspension, or even finish that what it was going to do and resume EE, if XX is long enough.
That is the case before or after the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the thread that sets m_pThreadAttemptingSuspendForGC is in cooperative mode not preemptive mode. When a GC is triggering a suspend, it is in cooperative mode (as its in the process of manipulating GC data structures), and the LockThreadStore routine does not transition to preemptive mode when attempting to suspend for a GC. (See the toggleGC boolean in ThreadSuspend::LockThreadStore.) Thus there is a potential for deadlock.
The old logic had this abort routine, which would kick in at any point in time during suspend, and if it ever triggered, it would stop attempting to suspend. Effectively, this make the whole threadstore a deadlock detecting lock, with a policy favoring the GC thread suspension logic. There was a race here, but it is protected by the abort process being inside the loop, and being able to abort suspension whenever the thread attempting suspend status became true. Effectively, the check could race, but only for one extra iteration of the suspension loop.
Now, the previous logic would simply abort the suspend immediately, and leave the various threads that it had managed in a state where they were suspended for some other purpose, and then ThreadSuspend::SuspendEE
will call RestartEE
and then use a goto to exit the threadlock, and then hopefully the the GC thread would come along and finish the suspend from its end. That logic all seems a bit sketchy to me, but it looks like it would have worked in the old approach. With your new code that moved the abort logic out of the loop, I don't see how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LockThreadStore does not transition to preemptive for GC threads because they are already supposed to be preemptive. Otherwise we could block on TSL while in coop mode, which would deadlock.
Let me check. It may be worth adding asserts/contracts to be explicit about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I have added a contract to ThreadStore::Enter()
for exactly this purpose already, I just forgot.
Probably worth adding a comment to LockThreadStore on why we do not bother about GC threads.
Or better not specialcase them in the toggleGC check at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline, and I'm convinced that the gc thread is preemptive when suspending.
// event, and the event can not be sent until the debugger is synchronized. In order to break | ||
// this deadlock cycle the GC must give up the threadstore lock, allow the debugger to synchronize, | ||
// then try again. | ||
(g_pDebugInterface->ThreadsAtUnsafePlaces() || g_pDebugInterface->IsSynchronizing())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since GC does call BeforeGarbageCollection
I am less sure that IsSynchronizing
while suspending is an impossible state. I think I will just bring the IsSynchronizing
check back.
It may be worth examining closely if it is needed. However we would do this check rarely (once per suspension) and only if debugger is attached, so even if it is an impossible state, the check won't be a big deal.
Need to fix an ASSERT to account for the possibility of NULL.
Thanks!!! |
Hello @VSadov! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Original goal was to understand better suspension code and add comments to tricky points.
There were few places that could benefit from some refactoring. In fact there were already comments in few places recommending changes.
The major part here is merging 2 nearly identical suspend/hijack/redirect loops into one loop and changing it a bit to avoid unnecessary work. In particular, when suspension of a thread is already in progress we were too eager to retry without giving the thread enough chances to park itself.
There are some minor cleanups as well - like removing code related to fibers.
And added a few comments in places where I felt like the purpose or invariants of the code were not obvious.
The end effects on the suspension latencies is
The real-world cases will be some mix of the above, so it will vary, but should improve.