Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK Crash Detection: Enable Native #66418

Closed
10 tasks done
Tracked by #59262 ...
kahest opened this issue Mar 6, 2024 · 10 comments
Closed
10 tasks done
Tracked by #59262 ...

SDK Crash Detection: Enable Native #66418

kahest opened this issue Mar 6, 2024 · 10 comments
Assignees
Labels
Scope: Backend Automatically applied to PRs that change backend components

Comments

@kahest
Copy link
Member

kahest commented Mar 6, 2024

Enable the SDK crash detection for the sentry-native SDK.

Tasks

Preview Give feedback
  1. Scope: Backend
    philipphofmann
  2. Scope: Backend
    philipphofmann
  3. Scope: Backend
    philipphofmann
  4. Scope: Backend
    philipphofmann
  5. Scope: Backend
    philipphofmann

Rollout Plan

Preview Give feedback
@philipphofmann
Copy link
Member

@supervacuus, I need your input, please. If you don’t know what SDK crash detection is, read this.

I plan to enable the SDK crash detection for the native SDK. To do this, I need to create a configuration similar to what we already have for the Cocoa SDK.

sdk_names=[
"sentry.cocoa",
"sentry.cocoa.capacitor",
"sentry.cocoa.react-native",
"sentry.cocoa.dotnet",
"sentry.cocoa.flutter",
"sentry.cocoa.kmp",
"sentry.cocoa.unity",
"sentry.cocoa.unreal",
],
# Since changing the debug image type to macho (https://github.com/getsentry/sentry-cocoa/pull/2701)
# released in sentry-cocoa 8.2.0 (https://github.com/getsentry/sentry-cocoa/blob/main/CHANGELOG.md#820),
# the frames contain the full paths required for detecting system frames in is_system_library_frame.
# Therefore, we require at least sentry-cocoa 8.2.0.
min_sdk_version="8.2.0",
system_library_path_patterns={r"/System/Library/*", r"/usr/lib/*"},
sdk_frame_config=SDKFrameConfig(
function_patterns={
r"*sentrycrash*",
r"*\[Sentry*",
r"*(Sentry*)*", # Objective-C class extension categories
r"SentryMX*", # MetricKit Swift classes
},
path_patterns={"Sentry**"},
path_replacer=FixedPathReplacer(path="Sentry.framework"),
),
# [SentrySDK crash] is a testing function causing a crash.
# Therefore, we don't want to mark it a as a SDK crash.
sdk_crash_ignore_functions_matchers={"**SentrySDK crash**"},

This is how the native configuration would roughly look:

native_config = SDKCrashDetectionConfig(
		// ...
    sdk_names=[
        "sentry.native"
    ],
    min_sdk_version="", 
    system_library_path_patterns={
	    r"C:\WINDOWS\SYSTEM32\*",
    },
    sdk_frame_config=SDKFrameConfig(
        function_patterns={
            r"sentry_*", 
        },
        # The native SDK usually has the same path as the application binary.
        # Therefore, we can't rely on it. We set a fixed path of Sentry for 
        # The SDK frames are so it's not empty.
        path_patterns={},
        path_replacer=FixedPathReplacer(path="sentry"),
    ),
    sdk_crash_ignore_functions_matchers={},
)

@supervacuus, please answer the following questions:

  1. Does the native SDK have more sdk names? If yes, which ones?
  2. Which system_library_path_patterns should I add? This is for keeping system library or system binary frames in the stacktrace for SDK crashes. I guess other paths on Linux/macOS could be /bin/*, /sbin/**, /usr/bin/**, /usr/sbin/**. Are there any other paths we should add?
  3. With the current function_patterns, the code marks every frame starting with sentry_ as an SDK frame. Do we need to add more function_patterns to identify frames stemming from the native SDK?
  4. Regarding sdk_crash_ignore_functions_matchers: For Cocoa we have a test function causing a crash we want to ignore. Does the native SDK have something similar?

@supervacuus
Copy link
Contributor

  1. Does the native SDK have more sdk names? If yes, which ones?

These are the names I am currently aware of:

sentry.native
sentry.native.android
sentry.native.android.capacitor
sentry.native.android.flutter
sentry.native.android.react-native
sentry.native.android.unity
sentry.native.android.unreal
sentry.native.dotnet
sentry.native.unity
sentry.native.unreal
  1. With the current function_patterns, the code marks every frame starting with sentry_ as an SDK frame. Do we need to add more function_patterns to identify frames stemming from the native SDK?

I think this should be enough to get started. The pattern covers sentry_ for the public interface and sentry__ for module-level interfaces.

  1. Regarding sdk_crash_ignore_functions_matchers: For Cocoa we have a test function causing a crash we want to ignore. Does the native SDK have something similar?

Planned, but not released. We can add this when we release it.

  1. Which system_library_path_patterns should I add? This is for keeping system library or system binary frames in the stacktrace for SDK crashes. I guess other paths on Linux/macOS could be /bin/*, /sbin/**, /usr/bin/**, /usr/sbin/**. Are there any other paths we should add?

This is the toughest question because I am unsure how this is independent of the existing grouping configurations and its concrete effect inside the detection. I guess you define an SDK crash as finding a frame that matches function_patterns (excluding sdk_crash_ignore_functions_matchers), and then there are no in-app frames up to the crash frame.

I would collect all native package/path paths in the grouping enhancement configs that remove app and add system (but there might be other categories that could make sense; for instance, the driver category hints at a problem in Windows where some paths are localized):

# known well locations for unix paths
family:native package:/lib/** -app
family:native package:/usr/lib/** -app
family:native path:/usr/local/lib/** -app
family:native path:/usr/local/Cellar/** -app
family:native package:linux-gate.so* -app

# Support frameworks that are not in-app
family:native package:**/Frameworks/libswift*.dylib -app
family:native package:**/Frameworks/KSCrash.framework/** -app
family:native package:**/Frameworks/SentrySwift.framework/** -app
family:native package:**/Frameworks/Sentry.framework/** -app

family:native package:"/System/Library/Frameworks/**" category=system
family:native package:"C:/Windows/**" category=system
family:native package:/usr/lib/** category=system

family:native package:/system/** category=system
family:native package:/vendor/** category=system

family:native package:**/libart.so category=system
package:/apex/com.android.*/lib*/** category=system

But as you can see in the file, the config contains many more pattern categories than just paths, which all have a considerable impact if applied (or not). So, maybe I do not understand the question. Wouldn't the SDK crash detection run after applying the above configuration?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 4, 2024
@supervacuus
Copy link
Contributor

Maybe @Swatinem can cross-check my answers? Thx!

@Swatinem
Copy link
Member

Swatinem commented Apr 4, 2024

The answers are spot-on.
For SDK names, we made those runtime-changeable at some point, so it can essentially be anything 🤷🏻‍♂️

And I agree that classifying the paths might be the most difficult thing here. And I’m also curious about how this new classification is different from whatever grouping is doing? Or if it could potentially use the same infrastructure underneath? (given that I recently rewrote the grouping enhancers code in Rust for a very nice perf bost)

@supervacuus
Copy link
Contributor

For SDK names, we made those runtime-changeable at some point, so it can essentially be anything 🤷🏻‍♂️

That is technically correct 😸; our users could even modify them. However, all downstream SDKs essentially set static strings, which are only assigned during runtime because the build sometimes can't know where the library is used later on. So, I think it is fair to use a predefined list.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 4, 2024
@philipphofmann
Copy link
Member

Thanks, @supervacuus and @Swatinem for your replies 😃.

For SDK names, we made those runtime-changeable at some point, so it can essentially be anything 🤷🏻‍♂️

If, in most cases, the SDK name is sentry.native, it's okay. The goal is to detect SDK crashes. We don't have to detect all of them. A high percentage is enough. I will stick to the above-mentioned list for now and see how it goes.

And I’m also curious about how this new classification is different from whatever grouping is doing? Or if it could potentially use the same infrastructure underneath? (given that I recently rewrote the grouping enhancers code in Rust for a very nice perf bost)

I didn't choose that road because I wanted to iterate fast on the feature and they don't necessarily match 100%. I can evaluate reusing the same infrastructure later, but I wouldn't do it now.

@philipphofmann
Copy link
Member

philipphofmann commented Apr 5, 2024

I overlooked your question, @supervacuus.

But as you can see in the file, the config contains many more pattern categories than just paths, which all have a considerable impact if applied (or not). So, maybe I do not understand the question. Wouldn't the SDK crash detection run after applying the above configuration?

The paths in the config actually map to package, module, path, abs_path, filename. I need to find a better name for this in the config.

I don't understand that part. Please rephrase it.

Wouldn't the SDK crash detection run after applying the above configuration?

@supervacuus
Copy link
Contributor

The paths in the config actually map to package, module, path, abs_path, filename. I need to find a better name for this in the config.

👍

I don't understand that part. Please rephrase it.

Wouldn't the SDK crash detection run after applying the above configuration?

You already answered this question with the following:

I didn't choose that road because I wanted to iterate fast on the feature and they don't necessarily match 100%. I can evaluate reusing the same infrastructure later, but I wouldn't do it now.

I have very little knowledge of what happens to events after leaving relay, so my naive idea was that your processing occurs after the grouping enhancer already has applied the rules I referenced above. I understand that these convenience rules may not match the crash detection goals, but for the native cases, I think that at least the -app/+system ones are a great start.

It also sounds like a topic that needs fine-tuning over time. We should allow false positives rather than false negatives and then tune towards reducing the false positives if they start to feel like noise.

This reminds me that I have two issues that are apparent crashes in the Native SDK, which we currently can't reproduce. They seem related, and I would love to know whether this affects many users. I heard that you might be happy with positive examples for testing:

getsentry/sentry-native#974
getsentry/sentry-native#859

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 5, 2024
@philipphofmann
Copy link
Member

I have very little knowledge of what happens to events after leaving relay, so my naive idea was that your processing occurs after the grouping enhancer already has applied the rules I referenced above. I understand that these convenience rules may not match the crash detection goals, but for the native cases, I think that at least the -app/+system ones are a great start.

I recall one more reason why I didn't want to use the grouping rules: PII. All crash events end up in a Sentry project, and we have to be careful not to collect stacktraces and any other data that could contain PII. Still, we might be able to reuse the grouping config, but not for now.

@philipphofmann
Copy link
Member

Full rollout completed. We can close this.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Mobile & Cross Platform SDK Apr 22, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
Archived in project
Archived in project
Development

No branches or pull requests

4 participants