Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: OB-35469 add subscription request parent span as child of disco… #334

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

obs-gh-virjramakrishnan
Copy link
Collaborator

…very request

// Invoke adds OTel span surrounding customer Handler invocation.
func (h SQSCheckerWrappedHandler) Invoke(ctx context.Context, payload []byte) ([]byte, error) {

eventType := reflect.TypeOf(events.SQSEvent{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to jump through all these hoops just to parse the sqs event?

@obs-gh-virjramakrishnan obs-gh-virjramakrishnan force-pushed the viraj-adjust-parent branch 2 times, most recently from 9d575aa to d599707 Compare August 29, 2024 18:04
"github.com/go-logr/logr"
)

type SQSCheckerWrappedHandler struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called SQSChecker? what is it checking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks if the payload is a SQS event and if so, injects context, I suppose SQSContextInjector might be more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not really injecting the context, its parsing it out right? Something like SQSWithContextHander

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I am misunderstanding how "Inject" is used in this context - what I meant was that we are using the context provided by the SQS event. But I like SQSWithContextHander

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean SQSWithContextHandler

Copy link
Contributor

@jta jta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell what the fix is exactly. Ideally you'd provide a git commit message that explains why things weren't working before, and why your change fixes it.

@@ -25,8 +25,6 @@ func (h *InstrumentedHandler) HandleSQS(ctx context.Context, request events.SQSE
for _, record := range request.Records {
var req Request
var err error
logger.V(3).Info("Getting context from message attributes")
ctx = NewSQSCarrier().Extract(ctx, record.MessageAttributes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspicious. If you remove this, what functionality is InstrumentedHandler providing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. This effectively just becomes a wrapper around HandleRequest that does nothing. There might be some more refactoring that is possible if I do remove this. The idea with this modification is to change the place where the context is being inserted from. I removed adding the context in here because I am adding it in in a different place, as a wrapper to the invocation. This allows the parent span of the subscription request to get the context from the SQS event (the context of the previous request). I'll add more description in the commit message.

@obs-gh-virjramakrishnan obs-gh-virjramakrishnan force-pushed the viraj-adjust-parent branch 3 times, most recently from 6b58251 to d450ae0 Compare August 31, 2024 00:36
@@ -90,10 +95,15 @@ func (m *Mux) Invoke(ctx context.Context, payload []byte) (response []byte, err
}
ctx = logr.NewContext(ctx, logger)

span := trace.SpanFromContext(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems more tracing related than handler/mux related? did this have to move here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so the reason I moved it here is that the Otel wrapper directly calls this - if we want that span to show up as in error, and have the payload show up in the we need to get the context after the Otel wrapper span has modified the context (otherwise there will be no span associated with ctx, and the otel will create a span and will would add the payload and error handling to this new blank span). I therefore factored the code into the mux handler.

…very request, and simplify invocaton call stack

In HandleSQS, the context is being injected from the SQS event, when the lambda is triggered by a SQS request. This means that the HandleRequest from the
Subscription request, which is properly part of another trace, is assuming the parent trace of the Discovery Request.

Hence, the Discover call “steals” the spans from the Subscriber, at least, all of the spans that are descendant from the HandleRequest call of the Subscribe request. That leaves only the parent span of Subscriber, which is created (and thus has context provided to it) before we extract the context from the SQS event. Thus, it shows up as its own trace.

Wrapping the OTEL handler wrapper (which is what makes the first span in the trace) with another handler that will add context from the SQS event if the payload is of that format means that
the first span should also get the context from the discovery request.
@obs-gh-virjramakrishnan
Copy link
Collaborator Author

Hi @jta, just to clarify what the bug was:

Before, the OTEL created traces were being created, and then, we would extract context from the SQS message if we received one, and replace the context. However, the effect of this is that all remaining spans in the Subscription request would be children of the HandleRequest from the discovery request, as opposed to the subscriber request. This meant that the subscriber spans were left childless. The "missing spans" were actually under the discovery request. To fix this, I made another wrapper that wraps around the OTEL wrapper. This means that we extract the context before we take start the otel spans - the context we pass in is the one from the discovery request, and so all of the subscription spans show up as children of the discovery handleRequest span.

@arthur-observe arthur-observe changed the title OB-35469: fix: add subscription request parent span as child of disco… fix: OB-35469 add subscription request parent span as child of disco… Sep 4, 2024
@@ -130,6 +130,6 @@ func New(ctx context.Context, cfg *Config) (*Lambda, error) {
return nil, fmt.Errorf("failed to register functions: %w", err)
}

l.Entrypoint = tracing.NewLambdaHandler(mux, tracerProvider)
l.Entrypoint = tracing.WrapHandlerSQSContext(tracing.NewLambdaHandler(mux, tracerProvider))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new wrapper that only applies in the subscriber case since in forwarder we're not injecting the context into the sqs messages/trying to read it back out.

@obs-gh-virjramakrishnan obs-gh-virjramakrishnan merged commit 875ff96 into main Sep 4, 2024
16 checks passed
@obs-gh-virjramakrishnan obs-gh-virjramakrishnan deleted the viraj-adjust-parent branch September 4, 2024 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants