Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avalonia calling into the runtime after shut down #90079

Open
filipnavara opened this issue Aug 6, 2023 · 17 comments
Open

Avalonia calling into the runtime after shut down #90079

filipnavara opened this issue Aug 6, 2023 · 17 comments
Assignees
Labels
area-Diagnostics-coreclr in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@filipnavara
Copy link
Member

Looks like a bad interaction of diagnostsic bits -- we are trying to notify the debugger of an exception and of a class oad (triggered by a jit request for PGO dat).

Stack traces:
Sample of Avalonia Application.txt

Repro on macOS:

Happens with .NET 6 and .NET 7, didn't test it on newer runtime.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 6, 2023
@ghost
Copy link

ghost commented Aug 6, 2023

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Looks like a bad interaction of diagnostsic bits -- we are trying to notify the debugger of an exception and of a class oad (triggered by a jit request for PGO dat).

Stack traces:
Sample of Avalonia Application.txt

Repro on macOS:

Happens with .NET 6 and .NET 7, didn't test it on newer runtime.

Author: filipnavara
Assignees: -
Labels:

area-Diagnostics-coreclr, untriaged

Milestone: -

@filipnavara
Copy link
Member Author

cc @AndyAyersMS

@AndyAyersMS
Copy link
Member

cc @AndyAyersMS

Thanks for opening an issue.

@tommcdon
Copy link
Member

tommcdon commented Aug 6, 2023

@filipnavara Thanks for reporting the issue. Does this reproduce on ARM64 (M1/M2) or X64 macOS (or both)?

@filipnavara
Copy link
Member Author

Tested and reproduced on M1. The underlying mechanism of the deadlock doesn't seem platform specific though.

@tommcdon
Copy link
Member

tommcdon commented Aug 7, 2023

@filipnavara thanks!

@tommcdon tommcdon added this to the 9.0.0 milestone Aug 7, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 7, 2023
@mikelle-rogers
Copy link
Member

I tried reproducing it with .NET 6 on Windows and wasn't able to, will continue once I have access to M1.

@mikelle-rogers
Copy link
Member

I also tried reproducing it with .NET 7 on Mac x64, but wasn't able to because there was a NullReferenceException in the Avalonia code.

@mikelle-rogers
Copy link
Member

mikelle-rogers commented Jul 5, 2024

Here is the stack of the avalonia app (was able to reproduce on M1)

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000100d8fbe8 libcoreclr.dylib`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1760
    frame #1: 0x0000000100e74c04 libcoreclr.dylib`Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode) + 1040
    frame #2: 0x0000000100e6ffa4 libcoreclr.dylib`Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*) + 68
    frame #3: 0x0000000101157320 libcoreclr.dylib`WaitForEndOfShutdown() + 72
    frame #4: 0x0000000100dd7c90 libcoreclr.dylib`CrstBase::ReleaseAndBlockForShutdownIfNotSpecialThread() + 120
    frame #5: 0x0000000101070ec8 libcoreclr.dylib`Debugger::LockForEventSending(Holder<Debugger*, &(Debugger::AcquireDebuggerLock(Debugger*)), &(Debugger::ReleaseDebuggerLock(Debugger*)), 0ul, &(int CompareDefault<Debugger*>(Debugger*, Debugger*)), true>*) + 48
    frame #6: 0x000000010106881c libcoreclr.dylib`DebuggerController::DispatchPatchOrSingleStep(Thread*, _CONTEXT*, unsigned char const*, SCAN_TRIGGER) + 512
    frame #7: 0x0000000101069b28 libcoreclr.dylib`DebuggerController::DispatchNativeException(_EXCEPTION_RECORD*, _CONTEXT*, unsigned int, Thread*) + 596
    frame #8: 0x000000010107621c libcoreclr.dylib`Debugger::FirstChanceNativeException(_EXCEPTION_RECORD*, _CONTEXT*, unsigned int, Thread*, int) + 124
    frame #9: 0x0000000100fa4044 libcoreclr.dylib`HandleHardwareException(PAL_SEHException*) + 396
    frame #10: 0x0000000100d65860 libcoreclr.dylib`SEHProcessException(PAL_SEHException*) + 348
    frame #11: 0x0000000100d98888 libcoreclr.dylib`PAL_DispatchException + 136
    frame #12: 0x0000000100d98518 libcoreclr.dylib`PAL_DispatchExceptionWrapper + 16
    frame #13: 0x0000000104cbea68
    frame #14: 0x0000000102cf6240 libAvaloniaNative.dylib`ComPtr<IAvnDispatcher>::~ComPtr() + 36
    frame #15: 0x000000019f9482e8 libsystem_c.dylib`__cxa_finalize_ranges + 476
    frame #16: 0x000000019f948070 libsystem_c.dylib`exit + 44
    frame #17: 0x000000019faa38d0 libdyld.dylib`dyld4::LibSystemHelpers::exit(int) const + 20
    frame #18: 0x000000019f70212c dyld`start + 2436

@mikelle-rogers
Copy link
Member

mikelle-rogers commented Jul 18, 2024

After further investigation, we found that the Runtime was shut down and Avalonia is calling into the Runtime after it is shut down. This is unsupported behavior, so we have opened an issue in Avalonia here. We will close this issue.

We know the Runtime is shut down because

g_fEEShutDown = 0x00000027 = ShutDown_Start|ShutDown_Finalize1|ShutDown_Finalize2|ShutDown_SyncBlock`. 

The call stack of the app when it is hanging is: 
        Child SP               IP Call Site
000000016BD73D00 0000000195CA334C libsystem_platform.dylib!_platform_memmove + 428
000000016BD73D00 0000000106AC1BD8 libcoreclr.dylib!Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode, void*) + 924
000000016BD73F60 0000000106AC11BC libcoreclr.dylib!Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)::$_1::operator()(Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)::__EEParam*) const + 92
000000016BD73F90 0000000106AB98D0 libcoreclr.dylib!Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*) + 352
000000016BD740A0 0000000106CD3F80 libcoreclr.dylib!CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 328
000000016BD740F0 0000000106CD3E20 libcoreclr.dylib!CLREventBase::Wait(unsigned int, int, PendingSync*) + 84
000000016BD74130 00000001070D1468 libcoreclr.dylib!WaitForEndOfShutdown() + 132
000000016BD74150 0000000106947A94 libcoreclr.dylib!CrstBase::ReleaseAndBlockForShutdownIfNotSpecialThread() + 68
000000016BD74190 0000000106F672C8 libcoreclr.dylib!Debugger::DoNotCallDirectlyPrivateLock() + 400
000000016BD741E0 0000000106F54D34 libcoreclr.dylib!Debugger::AcquireDebuggerLock(Debugger*) + 24
000000016BD74200 0000000106F65E48 libcoreclr.dylib!FunctionBase<Debugger*, &(Debugger::AcquireDebuggerLock(Debugger*)), &(Debugger::ReleaseDebuggerLock(Debugger*))>::DoAcquire() + 28
000000016BD74220 0000000106F65DE8 libcoreclr.dylib!BaseHolder<Debugger*, FunctionBase<Debugger*, &(Debugger::AcquireDebuggerLock(Debugger*)), &(Debugger::ReleaseDebuggerLock(Debugger*))>, 0ul, &(int CompareDefault<Debugger*>(Debugger*, Debugger*))>::Acquire() + 100
000000016BD74240 0000000106F6C4F8 libcoreclr.dylib!Debugger::LockForEventSending(Holder<Debugger*, &(Debugger::AcquireDebuggerLock(Debugger*)), &(Debugger::ReleaseDebuggerLock(Debugger*)), 0ul, &(int CompareDefault<Debugger*>(Debugger*, Debugger*)), true>*) + 100
000000016BD74270 0000000106F54708 libcoreclr.dylib!DebuggerController::DispatchPatchOrSingleStep(Thread*, _CONTEXT*, unsigned char const*, SCAN_TRIGGER) + 1204
000000016BD74460 0000000106F57A1C libcoreclr.dylib!DebuggerController::DispatchNativeException(_EXCEPTION_RECORD*, _CONTEXT*, unsigned int, Thread*) + 1380
000000016BD74510 0000000106F781D0 libcoreclr.dylib!Debugger::FirstChanceNativeException(_EXCEPTION_RECORD*, _CONTEXT*, unsigned int, Thread*, int) + 168
000000016BD74590 0000000106D2D98C libcoreclr.dylib!HandleHardwareException(PAL_SEHException*) + 1940
000000016BD75EB0 0000000106809C58 libcoreclr.dylib!SEHProcessException(PAL_SEHException*) + 256
000000016BD75EE0 000000010689C744 libcoreclr.dylib!PAL_DispatchExceptionInner(_CONTEXT*, _EXCEPTION_RECORD*) + 308
000000016BD75F90 000000010689C5DC libcoreclr.dylib!PAL_DispatchException + 52
000000016BD765D0 000000010689BB30 libcoreclr.dylib!PAL_DispatchExceptionWrapper + 16
000000016BD76FB0 0000000301A34B90 
FFFFFFFFFFFFFFFF 0000000107616240 libAvaloniaNative.dylib!ComPtr<IAvnDispatcher>::~ComPtr() + 36
000000016BD77000 0000000195B302E8 libsystem_c.dylib!__cxa_finalize_ranges + 476
000000016BD77060 0000000195B30070 libsystem_c.dylib!exit + 44
000000016BD77080 0000000195C8B8D0 libdyld.dylib!dyld4::LibSystemHelpers::getenv(char const*) const
000000016BD77090 00000001958EA12C dyld!start + 2436

The frame without a symbol is MicroCom.Runtime.MicroComVtblBase.Release(MicroCom.Runtime.Ccw*).

@mikelle-rogers mikelle-rogers changed the title Deadlock between debugger and class loading in PGO Avalonia calling into the runtime after shut down Jul 18, 2024
@jkotas
Copy link
Member

jkotas commented Jul 18, 2024

Avalonia is calling into the Runtime after it is shut down. This is unsupported behavior.

That's not correct. This is expected to work and not hang. We have fixed number of similar bugs. For example #100293 was fixing a very similar hang/crash where some networking code was calling into the runtime after the shutdown.

Runtime shutdown is not really a thing. We are really only notifying the diagnostic stack about the shutdown. However, the runtime is expected to work until the OS physically kills the process and the diagnostic stack is expected to stay out of the way once it was notified about shutdown.

It is fine for the debugger experience to be degraded after the diagnostic stack was notified about the shutdown. However, it is not expected to hang.

@jkotas jkotas reopened this Jul 18, 2024
@jkotas
Copy link
Member

jkotas commented Jul 18, 2024

we have opened an issue in Avalonia AvaloniaUI/Avalonia#16368

I agree that Avalonia may want to avoid calling into the runtime during shutdown. Minimizing work done during shutdown is general goodness. But it is not changing the fact that we have a bug in the runtime shutdown logic.

@mikelle-rogers
Copy link
Member

This issue no longer repros when using Avalonia 11.0.11.

@filipnavara
Copy link
Member Author

This issue no longer repros when using Avalonia 11.0.11.

Well, yeah, because Avalonia fixed the issue that triggered it... which still doesn't fix the debugger/runtime behavior even if the Avalonia users are unblocked.

@tommcdon
Copy link
Member

@jkotas Since Avalonia has already fixed the issue, shall we close this issue?

@filipnavara
Copy link
Member Author

@jkotas Since Avalonia has already fixed the issue, shall we close this issue?

I am fine with deprioritizing any work on this, but it's NOT fixed. The deadlock can still be triggered with the latest runtime. It just happened that Avalonia 11.0.0 was a high profile library that triggered it, and that particular version continues to trigger it today. The same problem can be triggered by other code, and I have seen it happening. This was just convenient way to reproduce it.

@jkotas
Copy link
Member

jkotas commented Jul 19, 2024

I agree with @filipnavara. This bug is not fixed.

We were lucky that the Avalonia repro was reliable enough to get to the bottom of it. The most likely manifestation of this bug is intermittent hang at the end of debugging session. My guess is that it is hit intermittently by quite a few users out there in number of workloads that we just do not know about.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jul 22, 2024
@tommcdon tommcdon removed this from the 9.0.0 milestone Jul 23, 2024
@tommcdon tommcdon added this to the 10.0.0 milestone Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Diagnostics-coreclr in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants