-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ICorProfiler] Add new API to enumerate GC Heap objects #103735
[ICorProfiler] Add new API to enumerate GC Heap objects #103735
Conversation
After making changes to corprof.idl and building the clr subset the generated corprof.h needs to be manually copied over from the artifacts obj directory into the prebuilt directory.
Tagging subscribers to this area: @tommcdon |
|
||
// Walk the GC Heap | ||
unsigned max_generation = hp->GetMaxGeneration(); | ||
hp->DiagWalkHeap((walk_fn)callback, callbackState, max_generation, TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wiktork does the full stop the world gen2 GC meet your performance criteria? I thought you were looking for something lighter weight, although I haven't been super involved in the design process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this compares to dump collection, which is basically what we are trying to be faster than. My previous discussions with Noah suggested that this would be significantly faster. I was also under the impression that we would be able to heap walk after a suspend, without the need to GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this API we solely suspend EE + fix allocation contexts + walk the gc heap + repair allocation contexts + resume EE, all without performing a GC.
Add parameter check Add SuspendEE akin to SuspendRuntime Add DiagWalkHeap Add RestartEE akin to ResumeRuntime
d849af2
to
c40ecbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good so far, I think the main thing remaining is to add a test case for the new API.
5cc754a
to
69ddad7
Compare
Still having a bit of trouble reproducing the CI failures, tried to follow the build steps through the different legs via
and I think when I attempted to run the test command script directly And building manually/running the test command, I haven't been able to repro the error on CI, which seems to be unable to find the custom GCHeap object for all checked no_tiered_compilation test legs. Any other ideas to repro @noahfalk ? Where might the dumps be for the windows/linux runtime tests legs? |
src/tests/profiler/native/gcheapenumerationprofiler/gcheapenumerationprofiler.cpp
Outdated
Show resolved
Hide resolved
Thread enumerateThread = new Thread(() => | ||
{ | ||
startEvent.WaitOne(); | ||
Thread.Sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is testing what you intend it to. The GC.Collect() keeps managed code suspended which means this sleep isn't waking up until the GC.Collect() is finished. By the time EnumerateGCHeapObjects() runs there is no concurrent suspension happening.
To test concurrent suspends you will need to coordinate it from native code. For example you might do this:
GCHeapEnumerationProfiler::GarbageCollectionStarted()
{
_gcStartSleeping = TRUE;
// if this new call to EnumerateGCHeapObjects doesn't wait for the GC then it is going to
// observe _gcStartSleeping == TRUE
thread_list.emplace_back(std::thread([instance]()
{
instance->ValidateEnumerateGCHeapObjects(S_OK);
}));
std::this_thread::sleep_for(std::chrono::seconds(1));
_gcStartSleeping = FALSE;
}
static BOOL STDMETHODCALLTYPE heap_walk_fn(ObjectID object, void* callbackState)
{
CallbackState* pState = (CallbackState*)callbackState;
if(pState->instance._gcStartSleeping)
{
//Error, no callbacks expected while the GC collection was going on
}
...
}
HRESULT GCHeapEnumerationProfiler::Shutdown()
{
// make sure async operations have finished
for (auto& t : threadList) {
t.join();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification, this would be testing invoking ProfToEEInterfaceImpl::EnumerateGCHeapObjects
during a non-profiler requested runtime suspension? As in we are expecting it to hit ProfToEEInterfaceImpl::EnumerateGCHeapObjects
's else if
in
if (!ThreadSuspend::SysIsSuspendInProgress() && (ThreadSuspend::GetSuspensionThread() == 0))
{
g_profControlBlock.fProfilerRequestedRuntimeSuspend = TRUE;
ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON::SUSPEND_FOR_PROFILER);
ownEESuspension = TRUE;
}
else if (!g_profControlBlock.fProfilerRequestedRuntimeSuspend)
{
return CORPROF_E_SUSPENSION_IN_PROGRESS;
}
else
{
// The profiler requested a runtime suspension before invoking this API,
// so the responsibility of resuming the runtime is outside the scope of this API.
// Given that the profiler owns the suspension, walking the GC Heap is safe.
}
and throw the error CORPROF_E_SUSPENSION_IN_PROGRESS
Originally, I was intending the 3rd test
to be the "background native thread invoking EnumerateGCHeapObjects and rejoining a thread that has an ongoing non-profiler requested runtime suspension" test, and for this (4th test) to be multiple managed threads racing for ThreadSuspend::SuspendEE
, in case the above if block
if (!ThreadSuspend::SysIsSuspendInProgress() && (ThreadSuspend::GetSuspensionThread() == 0))
{
g_profControlBlock.fProfilerRequestedRuntimeSuspend = TRUE;
ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON::SUSPEND_FOR_PROFILER);
ownEESuspension = TRUE;
}
is entered before a different native thread invokes ThreadSuspend::SuspendEE
(which would have toggled the condition to be false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification, this would be testing invoking ProfToEEInterfaceImpl::EnumerateGCHeapObjects during a non-profiler requested runtime suspension?
There are two different variations of this:
- Thread A suspends in the runtime code, Thread B calls EnumerateGCHeapObjects, Thread A resumes
- Thread A suspends in the runtime code, Thread A calls EnumerateGCHeapObjects, Thread A resumes.
The first case should work but thread B will block until Thread A resumes the runtime. The 2nd case would usually return CORPROF_E_UNSUPPORTED_CALL_SEQUENCE, but there are corner cases where it might make it past that first check and return CORPROF_E_SUSPENSION_IN_PROGRESS instead.
As in we are expecting it to hit ProfToEEInterfaceImpl::EnumerateGCHeapObjects's else if in
I proposed some alternate logic for how you would check for suspensions. The test case I suggested in my comment above does the EnumerateGCHeapObjects on a different thread than the one which suspended the runtime so I'd expect it to enumerate successfully after waiting for the GC to complete. It should run through this part of the code I recommended above:
else
{
_ASSERTE(!suspendedByThisThread);
// Its possible some background threads are suspending and resuming the runtime
// concurrently. We need to suspend the runtime on this thread to be certain the heap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I missed the above recommendations for checking thread suspension within ProfToEEInterfaceImpl::EnumerateGCHeapObjects
, after implementing a test as you suggested in the first comment in this thread, I attached a debugger and the background thread that invokes EnumerateGCHeapObjects does indeed get blocked on ThreadSuspend::SuspendEE
until GC completes
00 0000001b`0657f218 00007ffb`06d640e3 ntdll!ZwWaitForAlertByThreadId+0x14
01 0000001b`0657f220 00007ffb`06d51ce4 ntdll!TpWorkOnBehalfClearTicket+0x2c3
02 0000001b`0657f330 00007ffb`06d51ad2 ntdll!RtlEnterCriticalSection+0x254
03 0000001b`0657f3b0 00007ffa`5b26652e ntdll!RtlEnterCriticalSection+0x42
04 0000001b`0657f3e0 00007ffa`5b51372b coreclr!CrstBase::Enter+0x27e [C:\Users\mihw\source\repos\mdh1418\runtime\src\coreclr\vm\crst.cpp @ 325]
05 0000001b`0657f440 00007ffa`5b8abd8b coreclr!ThreadStore::Enter+0x1bb [C:\Users\mihw\source\repos\mdh1418\runtime\src\coreclr\vm\threads.cpp @ 5092]
06 0000001b`0657f560 00007ffa`5b8b18ad coreclr!ThreadSuspend::LockThreadStore+0x25b [C:\Users\mihw\source\repos\mdh1418\runtime\src\coreclr\vm\threadsuspend.cpp @ 1882]
07 0000001b`0657f690 00007ffa`5b808f13 coreclr!ThreadSuspend::SuspendEE+0x12d [C:\Users\mihw\source\repos\mdh1418\runtime\src\coreclr\vm\threadsuspend.cpp @ 5559]
08 0000001b`0657f770 00007ffa`672cd542 coreclr!ProfToEEInterfaceImpl::EnumerateGCHeapObjects+0x473 [C:\Users\mihw\source\repos\mdh1418\runtime\src\coreclr\vm\proftoeeinterfaceimpl.cpp @ 7707]
09 0000001b`0657f8f0 00007ffa`672cdda8 Profiler!GCHeapEnumerationProfiler::EnumerateGCHeapObjects+0xa2 [C:\Users\mihw\source\repos\mdh1418\runtime\src\tests\profiler\native\gcheapenumerationprofiler\gcheapenumerationprofiler.cpp @ 186]
0a 0000001b`0657f960 00007ffa`672cd335 Profiler!GCHeapEnumerationProfiler::ValidateEnumerateGCHeapObjects+0x18 [C:\Users\mihw\source\repos\mdh1418\runtime\src\tests\profiler\native\gcheapenumerationprofiler\gcheapenumerationprofiler.cpp @ 190]
0b 0000001b`0657f9a0 00007ffa`672ccab4 Profiler!<lambda_e2419102f21665b3332b03f113586c25>::operator()+0x25 [C:\Users\mihw\source\repos\mdh1418\runtime\src\tests\profiler\native\gcheapenumerationprofiler\gcheapenumerationprofiler.cpp @ 74]
0c 0000001b`0657f9d0 00007ffa`672cbe19 Profiler!std::invoke<<lambda_e2419102f21665b3332b03f113586c25> >+0x14 [C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\type_traits @ 1705]
0d 0000001b`0657fa00 00007ffa`673b63a0 Profiler!std::thread::_Invoke<std::tuple<<lambda_e2419102f21665b3332b03f113586c25> >,0>+0x59 [C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\thread @ 61]
0e 0000001b`0657fa60 00007ffb`055f257d Profiler!thread_start<unsigned int (__cdecl*)(void *),1>+0xb0 [minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 97]
0f 0000001b`0657fac0 00007ffb`06d8af28 KERNEL32!BaseThreadInitThunk+0x1d
10 0000001b`0657faf0 00000000`00000000 ntdll!RtlUserThreadStart+0x28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the test case to simulate scenario 1, where the thread suspending runtime through GC spins off another native thread to invoke EnumerateGCHeapObjects.
Will add the test case to simulate scenario 2 if I can find the test attribute that bypasses contract violation assert and hits the expected errors.
SHUTDOWNGUARD(); | ||
printf("GCHeapEnumerationProfiler::GarbageCollectionStarted\nSleeping for 10 seconds\n"); | ||
std::this_thread::sleep_for(std::chrono::seconds(10)); | ||
return S_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets validate that if you try to call EnumerateGCHeapObjects from inside CollectionStart/CollectionFinished that it doesn't work. I believe these callbacks are both no-trigger callbacks so I'd expect the calls to fail with CORPROF_E_UNSUPPORTED_CALL_SEQUENCE if you've added the additional kP2EETriggers check above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling EnumerateGCHeapObjects from inside GarbageCollectionStarted throws a Contract violation assert from the CONTRACTL/CONTRACTL_END
scope before hitting the PROFILER_TO_CLR_ENTRYPOINT_ASYNC_EX
macro and additional !IsCalledAsynchronously() && !AreCallbackStateFlagsSet(COR_PRF_CALLBACKSTATE_IN_TRIGGERS_SCOPE)
check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I expect that test case would only work in a retail build. I thought we had some test attribute that did that but don't worry about including it if it isn't straigtforward.
src/tests/profiler/native/gcheapenumerationprofiler/gcheapenumerationprofiler.cpp
Show resolved
Hide resolved
src/tests/profiler/native/gcheapenumerationprofiler/gcheapenumerationprofiler.cpp
Outdated
Show resolved
Hide resolved
3ae9827
to
39716d3
Compare
src/tests/profiler/native/gcheapenumerationprofiler/gcheapenumerationprofiler.cpp
Show resolved
Hide resolved
src/tests/profiler/native/gcheapenumerationprofiler/gcheapenumerationprofiler.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor suggestions inline, overall looks good! I'll also respond back to some of the existing comments in just a moment.
The dotnet-linker-tests failure should be fixed by #104606 |
This PR aims to add a new ICorProfilerInfo API that extends the ability to enumerate all GC Heap objects (GC's
GCHeap::DiagWalkHeap
) to a profiler. Within a GC there are preparation steps that are performed before+after walking the GC Heap viaDiagWalkHeap
, so this PR introduces a newIGCHeap
API that facilitates aDiagWalkHeap
outside of a GC.Additionally, this PR modifies
ProfToEEInterfaceImpl::SuspendRuntime
+ProfToEEInterfaceImpl::ResumeRuntime
to both be Asynchronous APIs, invokable directly through the profiler rather than only through callbacks.Testing
Verification with !dumpheap
Invoking the new EnumerateGCHeapObjects with a walk function like
observed 9999 objects from a sample app.
Attaching windbg to that same sample app and calling
!dumpheap
at the same spot where EnumerateGCHeapObjects was invoked led to 10082 objects, 83 of which were free.Furthermore, the Class types counts from both the heap walk function and
!dumpheap
matched up.Profiler tests
Added unit tests for the following behaviors: