-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRPC will keep retrying (until RPC timeout) if a user uses a bad service account key #6808
Comments
@dzou Based on |
I see, then forget about |
That's debatable. Without more information I can see a "bad request" succeeding after retrying (some kind of a race condition or simply |
Ah, so I am interpretting the 400 error-codes as an error being made on the caller's side - see this for what I had in mind: https://stackoverflow.com/questions/47680711/which-http-errors-should-never-trigger-an-automatic-retry I was thinking basically that a 400-code would mean the user's request is incorrect (i.e. in this case making a request with a service account key that does not exist), and if we know the user's request is invalid then we need not retry it. |
Consider gRPC's retry feature as part of an advanced framework wrapping an existing protocol like HTTP (HTTP2 to be more precise). Some users would expect the framework to retry even in cases of |
I agree that logic of |
@jiangtaoli2016, how do you think this should be fixed? It probably impacts all languages. I think there are two things to discuss here
A hacky "fix" for (2) is easy, but I don't think it puts us on solid ground and would just bite us in the future as an implementation change in google-auth-library-java could change the status code. And we'd need to figure out an equivalent solution in each language. Spending our time on (1) seems better, and while it does side-step the issue it also provides other benefits. |
The token change from JWT server account creds to OAuth token is via @ejona86 Could you point the grpc core change that you refer? I think it makes sense to have all languages behave consistently. |
@jiangtaoli2016, no, I can't point to the grpc core change. It was a long time ago and just mentioned to me "in person." I can't even point to the current behavior in c core. |
Sorry for delay. I took some time to research cloud auth APIs and related gRPC codes. When using service account key, one can either obtain an oauth2 access token from google token service (https://accounts.google.com/o/oauth2/token) and use the access token for Cloud APIs access or use the service account to create a JWT token and directly access Cloud APIs. The first method is ServiceAccountCredentials in Java or GoogleRefreshTokenCredentials in C++. The second method is ServiceAccountJwtAccessCredentials in Java or ServiceAccountJWTAccessCredentials. Unlike Java, gRPC core and wrapped language does not auto promote ServiceAccountCredentials into ServiceAccountJwtAccessCredentials. Both methods are valid and supported as of today. I don't think we could deprecate or should ServiceAccountCredentials. In gRPC C++ GoogleRefreshTokenCredentials, AccessTokenCredentials, and StsCredentials, an access token is fetched remotely and attached to call metadata. In other words, it is a quite common scenario we have to deal with. @ejona86: it is better to deal with it rather than dodge this issue, as you suggested in (1) above. In gRPC core implementation, to fetch an access token remotely, it uses grpc_httpcli_post to fetch token. By looking at grpc httpcli implementation, it does not seems retry. Basically, the fetching token is once. gRPC core does not parse response from HTTP request. Here is how it responds to http response. Any status code that is not 200 is treated as error. Unless success, IO error, auth error, or whatever error from token fetch will result in the same error message "Error occurred when fetching oauth2 token." Back to (2) on @ejona86 suggestion, I agree it is hacky for gRPC code to parse into HttpResponseException exception from google-auth-library-java. My suggestion would be doing something similar to c core: if fetching oauth2 token fails (due to various reasons), throw a generic error message such as "failed to fetch oauth2 token" and do not retry. Let application deal with it. |
@jiangtaoli2016, grpc-java is not retrying. It is delivering an UNAVAILABLE error to the client. But the client (properly) has retry logic that retries UNAVAILABLE errors. @jiangtaoli2016, how do you provide scopes for the oauth exchange in the C++ API? I thought scopes are required to obtain an oauth token. |
@ejona86 I check all c core code. The only place that uses scopes is StsTokenFetcherCredentials. |
@jiangtaoli2016, given that gRPC is not retrying and instead the application is retrying because gRPC returned UNAVAILABLE, are you suggesting gRPC continue returning UNAVAILABLE and just accept that the application will retry? The application will not have a good way of determining the HTTP failure was 400 Bad Request. |
gRPC returning UNAVAILABLE is a fine choice for this particular use case. Downloading service account key is a bad choice, posing security risks to customers and to Google. Customers shall try to use alternative approaches, such as Application Default Credentials (ADC) or service account impersonation. |
Hello, so to clarify, the position here is that this is working as intended? I can accept this outcome because I don't feel strongly, but I would only say there are many legitimate use-cases for downloading service account keys still, such as using it in CI for integration tests against GCP services. From my experience based on user reports, the occurrence of this issue is still relatively often (though not very common). |
@jiangtaoli2016 @ejona86 My understanding is that the client libraries (GAPIC-based) retry automatically based on whether the code the gRPC returns is "retryable". Regarding the use of service account keys, agreed, it's not the best choice from a security point of view when other options like ADC or impersonation are available. However, there are certain APIs like Cloud Translation that do not work with ADC and explicitly recommend downloading and using service account keys. |
@jiangtaoli2016, what was the verdict here? Is there something to be changed, or is this working as intended and appropriate? |
I would treat as working as intended. @meltsufin Have you try to use Secure Token Service? That is a better alternative than service account key. |
@jiangtaoli2016 I haven't used Secure Token Service, and I haven't seen it documented on product pages of any of the libraries. The issue is really not for any particular use case I might have, but for the users of our client libraries. The use of service account keys is recommended all over our documentation. Is this changing? |
@jiangtaoli2016, STS appears to be beta. |
I just marked #8003 as a duplicate, but it did nicely paint out how the components fit together. It also referenced googleapis/gax-java#965 |
A design is in-progress in goolge-auth-library-java to give gRPC the information it needs to choose the Status code appropriately (retriable vs non-retriable). The grpc-java changes are expected to be small. I'm very glad there's been recent movement here. |
Is there any information about progress? |
googleapis/google-auth-library-java#750 provides a method to distinguish between the two cases. gRPC will need to be updated to consume the new Retryable interface:
|
Retryable was added in google-auth-library 1.5.3 to make clear the situations that deserve a retry of the RPC. Bump to that version and swap away from the imprecise IOException heuristic. go/auth-correct-retry Fixes grpc#6808
Retryable was added in google-auth-library 1.5.3 to make clear the situations that deserve a retry of the RPC. Bump to that version and swap away from the imprecise IOException heuristic. go/auth-correct-retry Fixes #6808
Retryable was added in google-auth-library 1.5.3 to make clear the situations that deserve a retry of the RPC. Upgrading to that caused problems because of transitive dependency issues syncing into Google so it was reverted in 369f87b. google-auth-library 1.11.0 changed the approach to avoid the transitive dependency updates. cl/601545581 upgraded to 1.22.0 inside Google. Bump to that version and swap away from the imprecise IOException heuristic. go/auth-correct-retry Fixes grpc#6808
Retryable was added in google-auth-library 1.5.3 to make clear the situations that deserve a retry of the RPC. Upgrading to that caused problems because of transitive dependency issues syncing into Google so it was reverted in 369f87b. google-auth-library 1.11.0 changed the approach to avoid the transitive dependency updates. cl/601545581 upgraded to 1.22.0 inside Google. Bump to that version and swap away from the imprecise IOException heuristic. go/auth-correct-retry Fixes #6808
@ejona86, the commit 372a535 from #10909 on master is tagged 1.63 and this issue is closed in the 1.63 milestone. But looking at the 1.63 tarball or the released jar, the code changes from that commit aren't there. I see now that this commit was partially reverted on the 1.63.x branch: #11056 Will this issue be reopened? |
This is still on master, so sorta could be considered fixed, but I think it is still unsettled on whether we can make the change now. |
It seems the fix went out in 1.64 |
What version of gRPC-Java are you using?
1.27.2
What is your environment?
Linux, JDK8
What do you see?
When a user uses an invalid service account key (like one that was deleted), it is treated as an
UNAVAILABLE
error. Unavailable errors are interpreted as ones that should be retried by client libraries; consequently the application will attempt to retry this operation with invalid credentials until the RPC timeout is reached (usually 10 minutes).What do you expect to see instead?
When a user uses an invalid service account key to authenticate with GRPC, it should yield an
UNAUTHENTICATED
error which indicates the operation should not be retried.Steps to reproduce the bug
This is most easily reproduced through experimenting with the client libraries.
Additional details:
Suggested fix:
The code that controls this behavior is in GoogleAuthLibraryCallCredentials. I don't think all IOExceptions should be retried though.
In this case, one can see that the exception has a
.getCause()
which isHttpResponseException
and it has.getStatusCode() == 400
which indicates a bad request. This is the error thrown if the user provides an invalid service account key.Would it be possible to modify it so that if it is an IOException, it will examine the
getCause()
of the exception and throwUNAUTHENTICATED
if the cause isHttpResponseException
with status code 400?Example of exception that you see:
The text was updated successfully, but these errors were encountered: