Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Failfast windows event log #16875

Merged
merged 12 commits into from
Mar 27, 2018
Merged

Conversation

sywhang
Copy link

@sywhang sywhang commented Mar 10, 2018

Address issue #14037 and #16735.

Tested on Windows 10 / Windows Server 2012. Attached is a screenshot of an example crash log in Event Viewer:
eventlog

@sywhang
Copy link
Author

sywhang commented Mar 10, 2018

cc @jkotas @danmosemsft

@jkotas
Copy link
Member

jkotas commented Mar 10, 2018

What does this look like for unhandled exceptions?

I suspect that DoReportForUnhandledException is going to need some cleanup to make the unhandled exception reporting work.

@jkotas jkotas requested a review from noahfalk March 10, 2018 01:35
@sywhang
Copy link
Author

sywhang commented Mar 10, 2018

Unhandled exception doesn't show anything other than something like this in the Event Log:

Application: CoreRun.exe
CoreCLR Version: 4.6.26222.0
Description: The process was terminated due to an unhandled exception.

I can look into DoReportForUnhandledException and see how we can make it work for unhandled exceptions.

@danmoseley
Copy link
Member

It would be interesting to check that ERT_UnmanagedFailFast and ERT_StackOverflow are also working.

@danmoseley
Copy link
Member

Someday maybe we should do something simlar on Unix, maybe write something to /var/log/.net

@noahfalk
Copy link
Member

It would be interesting to check that ERT_UnmanagedFailFast and ERT_StackOverflow are also working.

@danmosemsft - By working I assume you mean that they haven't been regressed by this change?

@danmoseley
Copy link
Member

@noahfalk yes, although presumably we would like them to log also...in a follow up PR presumably

@davidfowl
Copy link
Member

I can look into DoReportForUnhandledException and see how we can make it work for unhandled exceptions.

Yes please!

/cc @jkotalik

@noahfalk
Copy link
Member

@danmosemsft - I'd hesitate to make any changes to the StackOverflow and UnmanagedFailFast paths at this point in the release cycle. (The stuff we've already done makes me nervous, but I felt up to this point the risk/reward was worth it so I didn't raise an issue... I hope it winds up being the right choice)

As a frame of reference, despite all the years we've had to fix issues and refine the desktop CLR error handlers we've 3 live site issues escalated to us in the last ~9 months because our error handler wound up hanging rather than exiting the process. Two of the cases were traced to a leaked lock being used in a localization helper for an error message formatting routine. The other one we deadlocked because the process had leaked one of the OS heap allocation locks. Although the error handlers are trying to be helpful, if we aren't really careful its easy to make things worse instead of better.

@danmoseley
Copy link
Member

@noahfalk fai enough. Let's just get DoReportForUnhandledException working at least.

@davidfowl
Copy link
Member

We should look at fixing the version number to flow from the host. That way people aren’t asking what .net Core 4.6 is.

@danmoseley danmoseley added this to the 2.1.0 milestone Mar 13, 2018
@danmoseley
Copy link
Member

The version number is being read out of coreclr.dll, which is getting stamped with 4.6.

C:\git\21COREFX>\t\filever -v coreclr.dll | \t\grep "4.6"
--a-- W32i     - ENU     4.6.26130.5 shp  4,318,352 02-13-2018 coreclr.dll
        ProductVersion  4.6.26130.05 @BuiltBy: dlab-DDVSOWINAGE032 @SrcCode: https://github.com/dotnet/coreclr/tree/2f14b350960af0354608c041d7df501c44dc5640
        FileVersion     4.6.26130.05 @BuiltBy: dlab-DDVSOWINAGE032 @SrcCode: https://github.com/dotnet/coreclr/tree/2f14b350960af0354608c041d7df501c44dc5640

@eerhardt do you know if it's intentional that we stamp it 4.6? It looks like this started in 1.1.

Commit hash would be nice acutally.

@sywhang
Copy link
Author

sywhang commented Mar 19, 2018

Here is a screenshot of an unhandled exception log:
unhandled_exception

@@ -721,53 +721,28 @@ void DoReportForUnhandledException(PEXCEPTION_POINTERS pExceptionInfo)
// We can also be here when in non-DefaultDomain an exception goes unhandled on a pure managed thread. In such a case,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, presumably this comment about non-default-domain no longer applies, since there is only one domain now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! I'll update the comments accordingly

@eerhardt
Copy link
Member

@eerhardt do you know if it's intentional that we stamp it 4.6?

No, I don't. Maybe @jkotas or @janvorli ?


_ASSERTE(m_eventType == ERT_ManagedFailFast);

m_Description.Append(W("Exception stack:\n"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be trying to load the resource (.LoadResource() and falling back to hard coded if that fails, as the rest of hte file does?

@jkotas
Copy link
Member

jkotas commented Mar 19, 2018

The version number is being read out of coreclr.dll, which is getting stamped with 4.6.

It is because of this: dotnet/corefx#10306 . I agree that it is super confusing.

Note that we use this confusing 4.5/4.6 version stamps in many places, not just for file versions. E.g. some nuget package versions have them too: https://github.com/dotnet/coreclr/blob/master/dependencies.props#L34

@danmoseley
Copy link
Member

We shouldn't hold this PR up on that versioning I think.

@jkotas
Copy link
Member

jkotas commented Mar 19, 2018

Here is a screenshot of an unhandled exception log:

Why does the screen shot show two exceptions?

@sywhang
Copy link
Author

sywhang commented Mar 19, 2018

@jkotas
The latter is an inner exception:

class Program
{
    static void Main()
    {
        A();
    }

    static void A()
    {
        B();
    }

    static void B()
    {
        try
        {
            C();
        }
        catch (Exception ex)
        {
            throw new ArgumentException("my ae", ex);
        }
    }

    static void C()
    {
        D();
    }

    static void D()
    {
        throw new NullReferenceException("my nre");
    }
}

@jkotas
Copy link
Member

jkotas commented Mar 19, 2018

I think it should print the outer exception first, and there should be some indication that the inner exception is inner exception.

E.g. Change your main method to:

try {
    A();
} catch (Exception e) {
    Console.WriteLine(e);
}

and check what it prints. I am not asking to match the output exactly, but it would be nice if it is somewhat similar.

@sywhang
Copy link
Author

sywhang commented Mar 20, 2018

@jkotas, Does this look better?

unhandled_exception_2

This one is basically what e.ToString() would print.

@jkotas
Copy link
Member

jkotas commented Mar 20, 2018

Yes, this looks good. Thanks.

@davidfowl
Copy link
Member

Is there any way we can flow the version information from core-setup? Maybe AppDomain.GetData? Seeing CoreCLR 4.6 is hella confusing

/cc @eerhardt

@@ -695,33 +726,24 @@ void DoReportForUnhandledException(PEXCEPTION_POINTERS pExceptionInfo)
// look that up and if it is not empty, add those details to the EventReporter so that they get written to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole comment starting at In CoreCLR, managed code execution happens in non-default AppDomains should be deleted.

BinderMethodID sigID = METHOD__EXCEPTION__INTERNAL_TO_STRING;

// Get the MethodDesc on which we'll call.
MethodDescCallSite toString(sigID, &(gc.throwable));
Copy link
Member

@noahfalk noahfalk Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause trouble. If you look at InternalUnhandledExceptionFilter_Worker, after it calls this method it then calls DefaultCatchHandler. DefaultCatchHandler also invokes Exception.ToString() in some cases, but first it does a bunch of checks to identify cases where it isn't safe to make the managed call.
https://github.com/dotnet/coreclr/blob/master/src/vm/excep.cpp#L5674

Some of those issues appear irrelevant for CoreCLR such as multiple AppDomains or ThreadAbort, but others such as OutOfMemory, StackOverflow, and CanCallManagedCode() are potentially all relevant. We should also investigate what error reporting tests exist which hopefully probe some of these cases.

{
StackSString result;
// Assume we're calling Exception.InternalToString() ...
BinderMethodID sigID = METHOD__EXCEPTION__INTERNAL_TO_STRING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to compute the same string as DefaultCatchHandler. Can we move the logging into the DefaultCatchHandler and eliminate this duplication?

src/vm/excep.cpp Outdated
EventReporter reporter(EventReporter::ERT_UnhandledException);
if (!message.IsEmpty())
{
reporter.AddDescription(message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think it is conceptually a bit cleaner to have the error reporting functionality within DefaultCatchHandler, there were several other cases and fallback paths in the old event logging code that are lost now. There is also an issue that the code InternalUnhandledExceptionFilter_Worker called DoReportForUnhandledException in a broader set of cases than it calls DefaultCatchHandler.

@sywhang
Copy link
Author

sywhang commented Mar 27, 2018

@noahfalk Could you please take another look? Thanks! :) If I'm not mistaken the Tizen CI is still flakey..? Correct me if I am wrong.

src/vm/excep.cpp Outdated
goto lDone;
}

LOG((LF_EH, LL_INFO100, "InternalUnhandledExceptionFilter_Worker: Calling DefaultCatchHandler\n"));


// Call our default catch handler to do the managed unhandled exception work.
DefaultCatchHandler(pParam->pExceptionInfo, NULL, useLastThrownObject,
TRUE /*isTerminating*/, FALSE /*isThreadBaseFIlter*/, FALSE /*sendAppDomainEvents*/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultCatchHandler() now has sendWindowsEventLog=FALSE by default so it would need to be overridden here. I assume this isn't working as-is.

@@ -5532,6 +5534,26 @@ DefaultCatchHandlerExceptionMessageWorker(Thread* pThread,
}

PrintToStdErrA("\n");

#if defined(FEATURE_EVENT_TRACE) && !defined(FEATURE_PAL)
// Send the log to Windows Event Log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for ShouldLogInEventLog() in the original is missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code gc protected throwable and then checked:

 if (IsException(gc.throwable->GetMethodTable()))

The code to handle the case where that wasn't true is missing. You can hit that case by throwing an object that doesn't derive from System.Exception. Its uncommon and I think not authorable in C#, but CLI does allow it and the code was handling it before so lets preserve it.

//
// Arguments:
// pExceptionInfo - Exception information
//
// Return Value:
// None
//
void DoReportForUnhandledException(PEXCEPTION_POINTERS pExceptionInfo)
void DoReportForIgnoredUnhandledException(PEXCEPTION_POINTERS pExceptionInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: How about we name this DoReportForUnhandledNativeException()?

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the last of it : ) Just make sure the error handling test cases are passing.

@sywhang
Copy link
Author

sywhang commented Mar 27, 2018

@noahfalk Should I add the tests for these in this PR, or do that in a separate PR?

@noahfalk
Copy link
Member

Either way should be fine. If they are ready to go I'd add them here, if you need a little more time to work on them lets get this in and put the tests in a follow up PR.

@sywhang
Copy link
Author

sywhang commented Mar 27, 2018

I'll just merge this in first and follow up with the tests in another PR. Thanks!

@sywhang sywhang merged commit 550f127 into dotnet:master Mar 27, 2018
@danmoseley
Copy link
Member

@sywhang it would be great to try to get those tests in for ZBB today if practical.

@sywhang sywhang deleted the failfast-windows-event-log branch March 8, 2019 02:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants