-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry temporary failures for Bigtable scans #2255
Comments
@sduskis You mean during So we could limit the change to the place we consume the next item from the stream? |
Taking a step back, these 4 errors can be recovered in most RPC calls. MutateRow, MutateRows, SampleRowKeys, checkAndMutate all can be retried, but have slightly different nuances. That said, ReadRows was the first reported issue where retries are most likely to fix the problem. In terms of ReadRows,I think you are correct in terms of where the code fix out to be. |
@dhermes @tseaver, @calpeyser will be taking a look at implementing this. Note that it's not enough to just resend the same request after a failure, we need to invoke |
Hey guys - I've had a look, here's a proposal: (1) Implementing Retries - we can probably do this by combining "retryable" in gax-python with the "response iterator" convention currently used in table#read_rows. Basically, we can wrap the GRPC stream in a callable class that does the following:
We then take the GRPC wrapper, and wrap it again in a gax-python retryable. This way, if an idempotent Exception is propagated up from a call, the call will be retried. (2) Testing - we need some new functionality over the current unit testing infrastructure, since those tests make assertions about the RPCs made by the client, and do not take responses into account. We'll need to add an integration testing framework that will spin up a mock server and make read_rows calls against it. @garye's test script should allow us to model errors sent by the server and the expectation of retries. So, we'll need to add
|
(1) With the current interface that only takes a row range, you don't know the row keys that still need to be read. You do know the last key that was read before an error (they come back in lexicographical order), which tells you which row range you need to send in the retried request. (2) I don't think re-implementing the mock server is Python is a great use of time. We can easily build linux, mac and windows binaries of the mock server and do something like pull one down from a public GCS bucket when the systests run depending on the current platform. I think the systests already install the bigtable emulator via gcloud which is along the same lines. The shortcut here is to use the |
I gave the above a whirl - please see #3279 |
Hello, As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a "bankruptcy" of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates. My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request. Thank you! |
This issue should be re-opened; this is a critical feature for the Bigtable client library that needs to be tracked until it is completed. My understanding of the history of this issue: PR #3279 was the first attempt, which was superseded by PR #3324, which was submitted, but then reverted in PR #3642 so this has not yet been merged. There's a branch Please re-open this issue so that we can track its progress. |
@jonparrott @zakons Do you know if retry and the recent work in bigtable would have fixed this? If not, that's ok, I'll follow up with this later. |
Hello, feature requests will now be tracked in the project Feature Requests. I will close this issue now, please feel free to continue to address any issues/concerns here. |
Can someone with permissions please add this to that Feature Requests page? This issue means that nearly all online examples on how to iterate over Bigtable are misleading; a simple |
@alugowski, @sduskis Since the merge of PR #5178, the new low-level GAPIC clients already handle retries, based on the relevant |
The low level GAPIC retries don't work for streaming RPCs like Cloud Bigtable Read Rows. Default retries just retry the original request until it completes. Retries for streaming RPCs need to pick up from where they previously left off, which requires custom logic to modify the request. We actually put in a fix for this 4 months ago with PR #4882. |
Thanks! Sounds like this kludge will no longer be necessary:
When the iteration takes on the order of hours then the logs are full of this:
|
Cloud Bigtable can return DEADLINE_EXCEEDED after 5 minutes on a scan request, even if there are more responses to be returned. In that case, or INTERNAL, UNAVAILABLE or ABORTED, the scan should be retried from the key after the one previously returned in the scan.
The text was updated successfully, but these errors were encountered: