-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Overriding the default behavior in case of unhandled exceptions and fatal errors. #101560
Comments
Defines |
I wonder if having the messages etc. as 16 bit strings is the right way. Our coreclr_xxx hosting APIs use 8 bit (UTF-8) strings so that users on Unix don't have to convert them using some external library. 16 bit strings are very unusual on Unix and on Windows, it is trivial to do the conversion if needed thanks to the Windows APIs. |
We had to choose between The key observations were -
|
In case the native handler on Unix would want to do anything sensible with these strings, even just writing them to a text log file, I think it would most likely need to convert them. In the runtime, we can easily convert them into stack allocated buffers without disturbing the process state. I still think it would be better if whatever string enters or leaves the runtime from the native side was 8 bit. Similar to the hosting APIs, it would add a little overhead for the handler on Windows, but much less than the overhead that would otherwise be needed on Unix. |
The approach in question was to have a |
How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations? |
There will be allocations for sure. The IO is nil or very limited if done from managed code if I recall. The native side trans-coding is also new and I believe avoids IO. It isn't the worst idea if it is just for logging. The tricky bit is if the callback comes back into managed though - that is my biggest concern. |
It can be ~100 - ~1000 lines depending on how optimized implementation you want. The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms, so everybody using this API on non-Windows would have to copy the conversion routine from somewhere. As a (usability) test for this API, you can write a sample handler that produces .json that captures the failure details (similar to .json that is produced by native AOT crash handler today). You will hit the UTF16 conversion problem pretty much immediately.
Calling managed implementation is not an option for the crash handlers. |
If a string comes from native assert (like from GC), running managed code might not be an option. |
Right, they will likely need to use ICU or some other library.
Excellent
Fair. This implies UTF-8 should be the preference then. |
That is not too bad. |
ICU is big, complicated library. It is not something you want to depend on in a crash handler. I think that the most appropriate solution would be to include source for the simplest possible conversion from somewhere. |
Is it possible to get content of I suggest to do it on demand because it will create issues for |
If conversion to UTF-8 is going to be needed in majority of scenarios (especially on Unix), I think we should do UTF-8 rather than expect that the end user will convert. I'll update the proposal to have |
On the other hand, a fatal error can happen only once in the life of an app, so perhaps performance of this API will not be as critical as in scenarios that can repeat. |
@vladimir-cheverdyuk-altium has a good point about having an option to suppress computation of the textual stacktrace or doing other non-trivial operations if the crash handler is not interested in the information. Such non-trivial operations can hit secondary crashes or otherwise interfere with the diagnosability of the original unhandled exception. I had similar concern in the original discussion. |
Runtime's implementation (which is pretty much standalone) could also be an option for users. e.g. this test code#include <stdio.h>
#include <errno.h>
#include <minipal/utf8.h>
size_t wstrlen(CHAR16_T* str)
{
size_t len = 0;
while (str[len++]);
return len;
}
void handleErrors()
{
if (errno == MINIPAL_ERROR_INSUFFICIENT_BUFFER)
{
printf ("Allocation failed (%d)", errno);
abort();
}
else if (errno == MINIPAL_ERROR_NO_UNICODE_TRANSLATION)
{
printf ("Illegal byte sequence encountered in the input. (%d)", errno);
abort();
}
}
int main(void)
{
CHAR16_T wstr[] = u"ハローワールド! 👋 🌏";
int wlen = wstrlen(wstr);
size_t mblen = minipal_get_length_utf16_to_utf8(wstr, wlen, 0);
handleErrors();
char* mbstr = (char *)malloc((mblen + 1) * sizeof(char));
size_t written = minipal_convert_utf16_to_utf8 (wstr, wlen, mbstr, mblen, 0);
handleErrors();
printf("Conversion completed. mblen: %zu, mbstr: %s\n", written, mbstr);
return 0;
} can be built with sources acquired from tag link: $ curl --create-dirs --output-dir external/minipal -sSLO \
"https://raw.githubusercontent.com/dotnet/runtime/v8.0.4/src/native/minipal/{utf8.c,utf8.h,utils.h}"
$ clang -Iexternal test.c external/minipal/utf8.c -o testutf8
$ ./testutf8
Conversion completed. mblen: 33, mbstr: ハローワールド! 👋 🌏 |
At the time of reporting the crash it would be too late to decide whether FailFast should have captured the exception message for the unhandled exception or not. We could add a bool parameter to The benefits of that would be slightly faster handling of the crash and more robust behavior in case Cases like OOM or stack overflow are always tricky. Also Basically - we can add a knob to Is it really that often that |
As it was stated in #98788, ex.ToString will make COM calls slow if it was in the call stack and it is why I'm concerned. |
Perhaps I do not understand the scenario... When a managed exception is unhandled(*), the runtime will call
I understand the reliability concern about (*) Unhandled here means - "really unhandled". If there was |
I am not worried about performance. I am worried about providing the right information to the crash handler, without introducing unnecessary failure points between the failure point and the call of the crash handler. On one end, you can have crash handlers that just want to capture crashdump for offline processing and do nothing else. Computing the textual stacktrace is source of unnecessary failure points for those. On the other end, you can have crash handlers that want to capture many details like stacktrace with file and line numbers if possible. The design should accommodate the whole spectrum. (We do not need to implement everything in the first round, but we should have an idea for how we would go about covering the whole spectrum.) |
@VSadov
COM calls slows down after ex.ToString called if some module is on the call stack. Some code that walks stack converts module to read/write mode from read-only mode and as result search became linear instead of binary. |
I am ready to have a broader conversation. @jkotas? |
Same here. |
The naming here seems confusing to me, this is named very similarly to the AppDomain.UnhandledException event, and in particular it doesn't seem clear that this won't be called for all unhandled exceptions, only unhandled exceptions that can actually be ignored. It's not a big difference, but what about try { UserCode(); } catch (Exception ex) when handler(ex){}; in that IMHO it's also somewhat more obvious that this won't be called for all unhandled exceptions, only those that can actually be filtered/ignored. A few other ideas, I don't like any of them, but maybe somebody else are inspired to come up with better names:
Additionally, it would be nice to be able to invoke the callback. Example use case for .NET for iOS: we have an API called InvokeOnMainThread (() => { ShowMessageBox ("Hello World!"); }); which would currently, if public void InvokeOnMainThread (Action action)
{
NativeInvokeOnMainThread (() => {
// This is effectively called from a reverse P/Invoke
try {
action ():
} catch (Exception ex) when ExceptionHandling.FilterUnhandledException (ex) {
// ignored
}
});
} that would be nice. Lastly:
There's no way to clear/change/reset the handler once it's set? Is that intentional? |
Does this proposal address crashes in unmanaged code? The main motivation in #79706 is the ability for a user app (which is commonly a combination of managed code and native libraries) to install a third-party crash reporter behind the .NET runtime so that all fatal app terminations are captured and logged. This proposal seems to only deal with unhandled managed exceptions - or will |
Good point. I agree that the name variants like you have suggested can be used to make the behavior more clear. Another option is to call it for all unhandled exceptions and change the filter to take an extra
It sound reasonable to me.
Yes, it is intentional. These callbacks are meant to be used at the applications level. The application is expected to be have one and only unhandled exception and fatal error handling policy.
We clearly want this to intercept handle crashes in .NET code, but handling crashes in other code is less clear cut. It makes sense if the app wants to own the process, but it does not make sense if the .NET code is a library (e.g. think native AOT library) and the main app has its own crash handler. Are we going to limit this to handling crashes in .NET code or should we have a flag that says whether the handler should intercept all crashes - both in .NET code and other code? |
Why a dedicated delegate type and not |
namespace System.Runtime.ExceptionServices
{
public static class ExceptionHandling
{
/// <summary>
/// Sets a handler for unhandled exceptions.
/// </summary>
/// <exception cref="ArgumentNullException">If handler is null</exception>
/// <exception cref="InvalidOperationException">If a handler is already set</exception>
public static void SetUnhandledExceptionHandler(Func<Exception, bool> handler);
/// <summary>
/// .NET runtime is going to call `fatalErrorHandler` set by this method before its own
/// fatal error handling (creating .NET runtime-specific crash dump, etc.).
/// </summary>
/// <exception cref="ArgumentNullException">If fatalErrorHandler is null</exception>
/// <exception cref="InvalidOperationException">If a handler is already set</exception>
public static unsafe void SetFatalErrorHandler(delegate* unmanaged<int, void*, int> handler);
}
} |
Will this also be implemented for .NET Android? On mobile platforms logging and debugging of crashes is very complex, especially because retrieving device logs means complicated instructions to a customer. |
Yes, it is the intention to implement these APIs on all platforms. (On some platforms, the fatal error handler may have limitations imposed by the platform.) |
You don't really need to deal with this manually though. Sentry on Android will capture crashes caused by unhandled exceptions in C# as well as from Java/Kotlin or native (NDK, C/C++/etc). The terminology used in this issue is fatal error. It attaches logs automatically, optionally attaches screenshots too. Supports Native AOT also outside of mobile. With line numbers on stack traces from C# as well as native code. some screenshots of Symbol CollectorI do some dogfooding of it on Symbol Collector: https://github.com/getsentry/symbol-collector/Disclaimer: I worked on building the Sentry SDK for .NET |
Re: The previous proposal and a discussion that led to this proposal - (#42275)
Background and motivation
The current default behavior in a case of unhandled exception is termination of a process.
The current default behavior in a case of a fatal error is print exception to console and invoke Watson/CrashDump.
While satisfactory to the majority of uses, the scheme is not flexible enough for some classes of scenarios.
Scenarios like Designers, REPLs or game scripting that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these types of scenarios.
In addition, there are customers that have existing infrastructure for postmortem analysis of failures and inclusion of .NET components requires interfacing with or overriding the way the fatal errors are handled.
API Proposal
API for process-wide handling of unhandled exception
The semantics of unhandled exception handler follows the model of imaginary handler like the following inserted in places where the exception will not lead to process termination regardless of what
handler()
returns.In particular:
false
.(Whether the infrastructure thread continues or restarted is unspecified, but the process should be able to proceed)
main()
will not install the try/catch like aboveAPI Proposal for custom handling of fatal errors
Managed API to set up the handler.
The shape of the FatalErrorHandler, if implemented in c++
(the default calling convention for the given platform is used)
With
FatalErrorHandlerResult
andFatalErrorInfo
defined in "FatalErrorHandling.h" under src/native/public:API Usage
Setting up a handler for unhandled exceptions:
Setting up a handler for fatal errors:
Setting up the handler for the process (C# code in the actual app):
The handler. (c++ in
myCustomCrashHandler.dll
)Alternative Designs
Unmanaged hosting API that enables this behavior. (CoreCLR has undocumented and poorly tested configuration option for this today. #39587. This option is going to be replaced by this API.)
Extending
AppDomain.CurrentDomain.UnhandledException
API and makeIsTerminating
property writeable to allow "handling".Upon scanning the existing use of this API it was found that
IsTerminating
is often used as a fact - whether an exception is terminal or not. Changing the behavior to mean "configurable" will be a breaking change to those uses.Risks
This APIs can be abused to ignore unhandled exceptions or fatal errors in scenarios where it is not warranted.
The text was updated successfully, but these errors were encountered: