-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry mechanism does not work if proxy server returns 503 on gRPC call #6780
Comments
My understanding is that for a server to be grpc "compliant", it needs to always return gRPC status codes even in error situations. So a load balancer that returns 503 would need to be adjusted to return a This is part of the reason why the http/protobuf variant of OTLP is now preferred over gRPC. If you can't ensure that all the components in your stack always return valid gRPC responses, consider switching to OTLP http/protobuf, which works better with the standard types of responses expected from load balancers and other off the shelf components. |
It returns the grpc status, but the retry code ignores it |
Maybe it's possible to simplify and not check http code in grpc retrier? |
Seems reasonable. Though testing may be challenging. Care to submit a PR? Code in question is here.
I think its reasonable for us to interpret a valid |
Hello, Would it be alright if I submitted a PR for this issue ? I’d love to contribute, and I want to ensure it’s okay to proceed. Thank you very much! |
PRs are welcome @JiwonKKang. Thanks! |
Resolved in #6829. |
Describe the bug
The retry mechanism in
OkHttpGrpcSender
fails to retry when Envoy returns a503
HTTP status code for gRPC requests. The gRPC call fails without retries being triggered, which should be handled according to the provided retry policy. This issue arises because of the logic at this line of code, where retries are not attempted if the HTTP response is not successful.Steps to reproduce
To make the issue easier to reproduce, you can use WireMock to simulate the proxy server behavior:
503
status code for certain gRPC requests.503 Service Unavailable
for requests.OkHttpGrpcSender
for exporting spans.503
status code, and the request fails immediately.Or use envoy for more production like behaviour:
503 Service Unavailable
when the backend is unavailable or as part of error injection for testing.v1.41.0
) to send gRPC requests through the proxy server.503
HTTP status code.What did you expect to see?
When the proxy server (Envoy) returns a
503 Service Unavailable
, the retry mechanism should trigger based on the providedRetryPolicy
and automatically retry the gRPC call.What did you see instead?
No retries were attempted. The following log is observed:
Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message: no healthy upstream
.There are no logs indicating retries were made.
What version and what artifacts are you using?
Artifacts:
opentelemetry-sdk
,opentelemetry-exporter-sender-okhttp
Version:
1.41.0
Environment
OS:
Ubuntu Jammy (22.04 LTS)
Runtime:
openjdk 17
Additional context
The issue seems to originate from the following code in OkHttpGrpcSender:
This logic prevents retries from being attempted when the response code is anything other than 200. However, non-200 HTTP status codes like 503 should be treated as retryable. It may be beneficial to use
RetryUtil.retryableHttpResponseCodes
along withresponse.isSuccessful()
to determine whether a request should be retried based on transient HTTP error codes.The text was updated successfully, but these errors were encountered: