feat: Provide option to retry request on rateLimitExceeded errors #809

jimfulton · 2021-07-25T17:11:41Z

This is mainly needed for sqlalchemy-bigquery compliance tests, but I can imagine it being useful for other use cases. :)

SQLAlchemy dialect compliance tests create tables at the beginning of a suite of tests and reuse them for the suite.

If a test fails due to rateLimitExceeded errors, the tables are left in an unexpected state. Retrying the test may not work.
There's an apparent option to recreate tables for each test, but it's not implemented correctly. I could probably fix it for current SQLAlchemy releases (1.4), but I'd probably have to monkey-patch 1.3.

I propose 2 new query-job config settings:

retry_on_rate_limit_exceeded: int, maximum retries when the client gets an "Exceeded rate limits" error, defaulting to 0
retry_delay_on_rate_limit_exceeded: int, number of seconds to wait before retrying when the client gets an "Exceeded rate limits" error, defaulting to 60.

The text was updated successfully, but these errors were encountered:

jimfulton · 2021-07-26T12:08:34Z

@tswast Are you good with this?

tswast · 2021-07-26T14:56:36Z

I think this would have to be done at DB-API layer -- and I'd actually like to retry queries by default there. Although maybe it could work on query job.

It's a bit trickier with query job since we can't retry if they've set a job ID. Also, we'd need to track whether they've explicitly set a destination table or if the destination table was set by the API (in which case we need to unset it before retrying the query)

tswast · 2021-07-26T14:57:56Z

I'd actually expect to use a retry object to configure this from API core library. Obviously, it'd have to be a different predicate than our normal "retry this API request" defaults.

tswast · 2021-07-26T16:00:54Z

As discussed offline, we can make the query job save the original request so that we don't have to worry about anything except for resetting the job ID before retrying.

tswast · 2021-07-26T21:42:47Z

Closing in favor of #539

jimfulton · 2021-07-27T18:46:59Z

Reopened because the scope here is a little different and I'd like to discuss without spamming Jake Summers.

jimfulton · 2021-07-27T18:49:37Z

I want to bring up a couple of issues.

Lets start with result.

@tswast is your thought that result would retry by reexecuting the original request if there's an error?

tswast · 2021-07-27T18:53:58Z

@tswast is your thought that result would retry by reexecuting the original request if there's an error?

Yes, I think that's the way it'd have to work for at least some jobs.

Though as you found out, it seems we can do some detection of failed by but retryable jobs even as early as the call to begin. Maybe we should start there, since it's less surprising to re-issue a job right away.

With result(), the job could have been created quite a while ago. For example, if the job class was created from a call to get_job(), I wouldn't expect result() to re-issue the job.

jimfulton · 2021-07-27T19:07:07Z

Though as you found out, it seems we can do some detection of failed by but retryable jobs even as early as the call to begin. Maybe we should start there, since it's less surprising to re-issue a job right away.

With begin we can retry at the API level. That's what the current code does when an API call raises an exception. In the sqlalchemy tests, this is where most of the errors seem to happen, which is good.

We want retry at a higher level if we get an error from get_job within the client query method, or when calling query job result.
In these cases, we want to avoid retrying at the API level, as it's a waste of time.

jimfulton · 2021-07-27T19:21:54Z

Next, retry predicates. Retry is based on exceptions, but BQ client api calls often don't use exceptions to track errors.

This makes predicates hard to express. You either need a new kind of Retry with a predicate that's applied to non-error results, or you need a flag to cause error responses (status code 200, but with an error status in the response data) to be turned into exceptions.

As I discovered in our call today, we'd want to retry at the API level for begin, but not for result because we don't want to retry API calls on failed jobs.

Maybe (thinking out loud):

We want Client._call_api to always raise exceptions for error responses (with 200 status codes), and have the retry machinery retry them using the retry passed to query but
After begin, we don't want to pass user retry objects to get_job and PollingFuture.result. If we use retries for those, the predicates used should be narrow so as to not retry api calls for failed jobs. Rather, we should apply user-provided retry (and our sensible-default retry) at the level of Client.query and QueryJob.result.

jimfulton · 2021-07-27T20:28:13Z

@tswast I'm confused about whether API requests should be retried after begin. I think I heard you and Seth say that any error responses (200 responses with a json body with an errorResult status) indicate a failed job and not a failed API request. Is that right?

In fact, maybe there's a deeper semantic I'm missing about these kind of responses.

tswast · 2021-07-27T20:32:40Z

That's correct. The job was created, so retrying exactly the same request will fail. But retrying with a fresh job ID might succeed.

tswast · 2021-07-27T20:33:27Z

We want Client._call_api to always raise exceptions for error responses (with 200 status codes), and have the retry machinery retry them using the retry passed to query but

That's unlikely to work because of re-used job IDs.

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 25, 2021

jimfulton self-assigned this Jul 25, 2021

tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 26, 2021

tswast mentioned this issue Jul 26, 2021

feat: add ability to retry query job automatically if the job fails #539

Closed

jimfulton closed this as completed Jul 27, 2021

jimfulton reopened this Jul 27, 2021

jimfulton closed this as completed Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Provide option to retry request on rateLimitExceeded errors #809

feat: Provide option to retry request on rateLimitExceeded errors #809

jimfulton commented Jul 25, 2021

jimfulton commented Jul 26, 2021

tswast commented Jul 26, 2021 •

edited

Loading

tswast commented Jul 26, 2021

tswast commented Jul 26, 2021

tswast commented Jul 26, 2021

jimfulton commented Jul 27, 2021

jimfulton commented Jul 27, 2021

tswast commented Jul 27, 2021

jimfulton commented Jul 27, 2021

jimfulton commented Jul 27, 2021

jimfulton commented Jul 27, 2021

tswast commented Jul 27, 2021

tswast commented Jul 27, 2021

feat: Provide option to retry request on rateLimitExceeded errors #809

feat: Provide option to retry request on rateLimitExceeded errors #809

Comments

jimfulton commented Jul 25, 2021

jimfulton commented Jul 26, 2021

tswast commented Jul 26, 2021 • edited Loading

tswast commented Jul 26, 2021

tswast commented Jul 26, 2021

tswast commented Jul 26, 2021

jimfulton commented Jul 27, 2021

jimfulton commented Jul 27, 2021

tswast commented Jul 27, 2021

jimfulton commented Jul 27, 2021

jimfulton commented Jul 27, 2021

jimfulton commented Jul 27, 2021

tswast commented Jul 27, 2021

tswast commented Jul 27, 2021

tswast commented Jul 26, 2021 •

edited

Loading