Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fanoutconsumer] Mark fanout consumer error as permanent #7868

Conversation

jpkrohling
Copy link
Member

Based on discussions from the SIG's call, I changed one of the fanout consumers to mark the returned error as permanent if at least one consumer succeeded and one failed, as we don't want retries to be performed on those situations.

Once this is done, #7486 will be changed to return a 500 in case the error it receives is a permanent error.

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

@jpkrohling jpkrohling changed the title Add changelog [fanoutconsumer] Mark fanout consumer error as permanent Jun 9, 2023
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling jpkrohling force-pushed the jpkrohling/wrap-consumer-in-permanent branch from 204b2a2 to d4bbaec Compare June 9, 2023 20:00
@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (89d1060) 91.14% compared to head (d4bbaec) 91.15%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7868   +/-   ##
=======================================
  Coverage   91.14%   91.15%           
=======================================
  Files         298      298           
  Lines       14915    14928   +13     
=======================================
+ Hits        13594    13607   +13     
  Misses       1045     1045           
  Partials      276      276           
Impacted Files Coverage Δ
service/internal/fanoutconsumer/traces.go 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Jun 12, 2023

I'm not convinced that the fan-out consumer is the right place to arbitrarily decide whether a payload should be retried or not. The existing approach of the fan-out consumer collecting all errors from consumers and passing them back to the sender seems enough for the sender to be able to decide by itself whether to retry or not, perhaps based on the errors returned. For example this existing OTLP receiver code:

	otlpResp, err := tracesReceiver.Export(req.Context(), otlpReq)
	if err != nil {
		for 
		writeError(resp, encoder, err, http.StatusInternalServerError)
		return
	}

could be changed to something similar to this:

	otlpResp, err := tracesReceiver.Export(req.Context(), otlpReq)
	if err != nil {
		isPermanent := false
		for _, individualConsumerError := range multierr.Errors(err) {
			if consumererror.IsPermanent(individualConsumerError) {
				isPermanent = true
				break
			}
		}
		if isPermanent {
			writeError(resp, encoder, err, http.StatusInternalServerError)
		} else {
			writeError(resp, encoder, err, http.StatusServiceUnavailable)
		}
		return
	}

We could make it into a helper function to make it easier for receivers (and other components) to use.

What do you think Juraci?

@bogdandrutu
Copy link
Member

@astencel-sumo You are making an assumption in your usage that we return a multierr.Errors. Is that guaranteed anywhere?

@jpkrohling
Copy link
Member Author

I think I agree with @astencel-sumo, except that we don't need to go over all errors in multierr: consumererror.IsPermanent will use IsA, which will unwrap all errors in the chain. So, if anything in the multierr is a permanent error, the function will return true.

I just confirmed this by reverting the traces.go from this PR to the latest state from main and changing the test to:

	p1 := mutatingErr{Consumer: consumertest.NewErr(myErr)}
	p2 := consumertest.NewErr(consumererror.NewPermanent(myErr))
	p3 := new(consumertest.TracesSink)

One permanent error from at least one exporter is enough to have IsPermanent to return true. If you all agree with @astencel-sumo's approach, I think I'll change this PR to just include the new test, and go back to the otlp receiver doing something similar to @astencel-sumo's proposal.

@bogdandrutu
Copy link
Member

I think the problem was not finding if one pipeline permanently errored, but what to do in the cases like one pipeline succeed the other has a retriable error? Do we return to the client a retriable error and they can retry? That will cause duplicate data on one path, what do you do then?

@andrzej-stencel
Copy link
Member

@astencel-sumo You are making an assumption in your usage that we return a multierr.Errors. Is that guaranteed anywhere?

@bogdandrutu I actually don't make this assumption. The multierr.Errors accepts a regular error type as parameter and works on regular errors equally well as on multierr errors - it returns the error itself when passed in something else than a multiErr) - source.

Anyway, I agree with @jpkrohling that calling multierr.Errors and iterating over the returned list is not needed if the sender only wants to check if there's a permanent error among those returned from ConsumeTraces/Metrics/Logs.

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Jun 13, 2023

I think the problem was not finding if one pipeline permanently errored, but what to do in the cases like one pipeline succeed the other has a retriable error? Do we return to the client a retriable error and they can retry? That will cause duplicate data on one path, what do you do then?

I agree it's important to discuss all possible scenarios. Let's discuss in the original PR about propagating errors if that's OK? Propagate errors from exporters to receivers #7486

Here are my thoughts on this: #7486 (comment)

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 28, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants