-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
cc @jkotas @danmosemsft |
What does this look like for unhandled exceptions? I suspect that DoReportForUnhandledException is going to need some cleanup to make the unhandled exception reporting work. |
Unhandled exception doesn't show anything other than something like this in the Event Log:
I can look into DoReportForUnhandledException and see how we can make it work for unhandled exceptions. |
It would be interesting to check that ERT_UnmanagedFailFast and ERT_StackOverflow are also working. |
Someday maybe we should do something simlar on Unix, maybe write something to |
@danmosemsft - By working I assume you mean that they haven't been regressed by this change? |
@noahfalk yes, although presumably we would like them to log also...in a follow up PR presumably |
Yes please! /cc @jkotalik |
@danmosemsft - I'd hesitate to make any changes to the StackOverflow and UnmanagedFailFast paths at this point in the release cycle. (The stuff we've already done makes me nervous, but I felt up to this point the risk/reward was worth it so I didn't raise an issue... I hope it winds up being the right choice) As a frame of reference, despite all the years we've had to fix issues and refine the desktop CLR error handlers we've 3 live site issues escalated to us in the last ~9 months because our error handler wound up hanging rather than exiting the process. Two of the cases were traced to a leaked lock being used in a localization helper for an error message formatting routine. The other one we deadlocked because the process had leaked one of the OS heap allocation locks. Although the error handlers are trying to be helpful, if we aren't really careful its easy to make things worse instead of better. |
@noahfalk fai enough. Let's just get DoReportForUnhandledException working at least. |
We should look at fixing the version number to flow from the host. That way people aren’t asking what .net Core 4.6 is. |
The version number is being read out of coreclr.dll, which is getting stamped with 4.6.
@eerhardt do you know if it's intentional that we stamp it 4.6? It looks like this started in 1.1. Commit hash would be nice acutally. |
src/vm/eventreporter.cpp
Outdated
@@ -721,53 +721,28 @@ void DoReportForUnhandledException(PEXCEPTION_POINTERS pExceptionInfo) | |||
// We can also be here when in non-DefaultDomain an exception goes unhandled on a pure managed thread. In such a case, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, presumably this comment about non-default-domain no longer applies, since there is only one domain now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch! I'll update the comments accordingly
src/vm/eventreporter.cpp
Outdated
|
||
_ASSERTE(m_eventType == ERT_ManagedFailFast); | ||
|
||
m_Description.Append(W("Exception stack:\n")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be trying to load the resource (.LoadResource(
) and falling back to hard coded if that fails, as the rest of hte file does?
It is because of this: dotnet/corefx#10306 . I agree that it is super confusing. Note that we use this confusing 4.5/4.6 version stamps in many places, not just for file versions. E.g. some nuget package versions have them too: https://github.com/dotnet/coreclr/blob/master/dependencies.props#L34 |
We shouldn't hold this PR up on that versioning I think. |
Why does the screen shot show two exceptions? |
@jkotas
|
I think it should print the outer exception first, and there should be some indication that the inner exception is inner exception. E.g. Change your main method to:
and check what it prints. I am not asking to match the output exactly, but it would be nice if it is somewhat similar. |
@jkotas, Does this look better? This one is basically what |
Yes, this looks good. Thanks. |
Is there any way we can flow the version information from core-setup? Maybe AppDomain.GetData? Seeing CoreCLR 4.6 is hella confusing /cc @eerhardt |
src/vm/eventreporter.cpp
Outdated
@@ -695,33 +726,24 @@ void DoReportForUnhandledException(PEXCEPTION_POINTERS pExceptionInfo) | |||
// look that up and if it is not empty, add those details to the EventReporter so that they get written to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole comment starting at In CoreCLR, managed code execution happens in non-default AppDomains
should be deleted.
src/vm/eventreporter.cpp
Outdated
BinderMethodID sigID = METHOD__EXCEPTION__INTERNAL_TO_STRING; | ||
|
||
// Get the MethodDesc on which we'll call. | ||
MethodDescCallSite toString(sigID, &(gc.throwable)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause trouble. If you look at InternalUnhandledExceptionFilter_Worker, after it calls this method it then calls DefaultCatchHandler. DefaultCatchHandler also invokes Exception.ToString() in some cases, but first it does a bunch of checks to identify cases where it isn't safe to make the managed call.
https://github.com/dotnet/coreclr/blob/master/src/vm/excep.cpp#L5674
Some of those issues appear irrelevant for CoreCLR such as multiple AppDomains or ThreadAbort, but others such as OutOfMemory, StackOverflow, and CanCallManagedCode() are potentially all relevant. We should also investigate what error reporting tests exist which hopefully probe some of these cases.
…irst before calling managed code
src/vm/eventreporter.cpp
Outdated
{ | ||
StackSString result; | ||
// Assume we're calling Exception.InternalToString() ... | ||
BinderMethodID sigID = METHOD__EXCEPTION__INTERNAL_TO_STRING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trying to compute the same string as DefaultCatchHandler
. Can we move the logging into the DefaultCatchHandler
and eliminate this duplication?
…orUnhandledException
src/vm/excep.cpp
Outdated
EventReporter reporter(EventReporter::ERT_UnhandledException); | ||
if (!message.IsEmpty()) | ||
{ | ||
reporter.AddDescription(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I think it is conceptually a bit cleaner to have the error reporting functionality within DefaultCatchHandler, there were several other cases and fallback paths in the old event logging code that are lost now. There is also an issue that the code InternalUnhandledExceptionFilter_Worker called DoReportForUnhandledException in a broader set of cases than it calls DefaultCatchHandler.
@noahfalk Could you please take another look? Thanks! :) If I'm not mistaken the Tizen CI is still flakey..? Correct me if I am wrong. |
src/vm/excep.cpp
Outdated
goto lDone; | ||
} | ||
|
||
LOG((LF_EH, LL_INFO100, "InternalUnhandledExceptionFilter_Worker: Calling DefaultCatchHandler\n")); | ||
|
||
|
||
// Call our default catch handler to do the managed unhandled exception work. | ||
DefaultCatchHandler(pParam->pExceptionInfo, NULL, useLastThrownObject, | ||
TRUE /*isTerminating*/, FALSE /*isThreadBaseFIlter*/, FALSE /*sendAppDomainEvents*/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultCatchHandler() now has sendWindowsEventLog=FALSE by default so it would need to be overridden here. I assume this isn't working as-is.
@@ -5532,6 +5534,26 @@ DefaultCatchHandlerExceptionMessageWorker(Thread* pThread, | |||
} | |||
|
|||
PrintToStdErrA("\n"); | |||
|
|||
#if defined(FEATURE_EVENT_TRACE) && !defined(FEATURE_PAL) | |||
// Send the log to Windows Event Log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for ShouldLogInEventLog() in the original is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code gc protected throwable and then checked:
if (IsException(gc.throwable->GetMethodTable()))
The code to handle the case where that wasn't true is missing. You can hit that case by throwing an object that doesn't derive from System.Exception. Its uncommon and I think not authorable in C#, but CLI does allow it and the code was handling it before so lets preserve it.
src/vm/eventreporter.cpp
Outdated
// | ||
// Arguments: | ||
// pExceptionInfo - Exception information | ||
// | ||
// Return Value: | ||
// None | ||
// | ||
void DoReportForUnhandledException(PEXCEPTION_POINTERS pExceptionInfo) | ||
void DoReportForIgnoredUnhandledException(PEXCEPTION_POINTERS pExceptionInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: How about we name this DoReportForUnhandledNativeException()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the last of it : ) Just make sure the error handling test cases are passing.
@noahfalk Should I add the tests for these in this PR, or do that in a separate PR? |
Either way should be fine. If they are ready to go I'd add them here, if you need a little more time to work on them lets get this in and put the tests in a follow up PR. |
I'll just merge this in first and follow up with the tests in another PR. Thanks! |
@sywhang it would be great to try to get those tests in for ZBB today if practical. |
Address issue #14037 and #16735.
Tested on Windows 10 / Windows Server 2012. Attached is a screenshot of an example crash log in Event Viewer: