Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error handling proposal #153
error handling proposal #153
Changes from 4 commits
1b6737a
ac14996
f90437e
d30f90a
b4eee5a
58179c6
03e937e
55bbad2
dcb740c
6c87edf
3812350
8c01355
669052e
4109c10
a10d8ea
ea993fd
872ebd4
de7cb15
a8451f1
e4ddb0c
eb4bc45
30bb855
6fed7dc
83f4477
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point. I think I have a different opinion:
Happy to be convinced otherwise, but my understanding is that as long as we apply the same checks across the API and SDK we can throw exception for obvious things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one problem that you may run into is that arguments you are passing to telemetry API may be received from external source. So you have a
null
string. If you are smart and read the doc - you will check fornull
and pass some random name to the API. If you are smart and haven't read the doc - API will crash and potentially bring this request processing with you.Both outcomes can be worked around for by applying the same smart default as customer would need to anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but running a "blind" process with no monitor at all because we cannot export data to the metrics backend, is that better? I feel that fail fast approach should be used here and crashing during the initialization is very good in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @bogdandrutu on "API can throw exception for things like null span name because that is documented".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. At the same time, we need a balance for reporting errors for truly fatal things (i.e. " cannot export data to the metrics backend"), but otherwise throwing exceptions should stay at a minimum level.
Specific case: I do think using a null name for
Span
could fallback to using a default name, instead of throwing. Documenting it is nice, but providing defaults for these kind of things is a good alternative ;)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree re: failing fast at initialization/construction time, but not in ways that might crash the program (or thread or ____) at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I think it's good practice to validate the required argument when end users call our API and throw exceptions when it violates certain constraints. Consider other cases like name being too long, I think it's better to throw exceptions than silently truncate it or use defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better if application doesn't start due to some environment variable was misconfigured (for resource API for instance) or if application sent a slightly inconsistent telemetry?
Same for span name. Would you rather return
500
to customer or useempty
for thenull
string? (see also comment for @bogdandrutu 's questionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some conversations in Python SDK open-telemetry/opentelemetry-python#11 (comment).
One thing to consider: there is no simple way to determine if span name is too long since different exporters/backends might have different limitations. We don't want the users to see the application running fine for exporter A, and start to see unhandled exceptions when they switch to exporter B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While very valuable, self-diagnostics are inherently complicated. This feels like something that needs a larger discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that user-facing instrumentation APIs should never return errors (or throw them).
One reason that users should never receive these, is that the'll be tempted to turn around and call the instrumentation library with the error.
The SDK will, however, encounter errors, and it a user has a right to know, but not with in-line code. In the opentracing Go library we addressed this by letting the application register a callback for receiving self diagnostics, including errors, that would allow the application to handle self diagnostics in a separate module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing exactly the same at Dynatrace with our agent SDK:
The SDK API never throws, since we do not want to change the application's behavior and risk crashing it if they don't catch everything properly, just because monitoring wouldn't work. In order to not let our SDK users in the dark and give them guidance for troubleshooting we also offer a diagnostic callback. It provides immediate logging output in case of errors (see https://github.com/Dynatrace/OneAgent-SDK#logging-callback). This turned out to work well and be pretty helpful in the past.