Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dataloss bug in batch Storage API sink. #26503

Merged
merged 6 commits into from
May 2, 2023

Conversation

reuvenlax
Copy link
Contributor

@reuvenlax reuvenlax commented May 2, 2023

Two issues fixed:

  1. Retry behavior was incorrect. We created a new stream abandoning the old, but then only retried the current append.
  2. Offset was inadvertently reset to 0 the first time we called append, which meant that the second append would always fail with invalid offset. This triggered the data-loss bug in 1.

We fixed the offset bug and no longer create a new stream on failure. In cases where the stream is known to be unusable (e.g. it's been deleted), we raise an exception which forces retry of the entire work item.

@reuvenlax
Copy link
Contributor Author

R: @johnjcasey

@github-actions
Copy link
Contributor

github-actions bot commented May 2, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@johnjcasey
Copy link
Contributor

LGTM. This will break storage api writes for now, but we will no longer have false positive results

@johnjcasey
Copy link
Contributor

LGTM

+ this.streamName
+ " failed with invalid "
+ "offset of "
+ failedContext.offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful for future debugging if we include the error's stacktrace here?

@johnjcasey
Copy link
Contributor

fixes #26521 and #26520

@kennknowles
Copy link
Member

Why isn't there a test for this?

@reuvenlax
Copy link
Contributor Author

reuvenlax commented May 3, 2023 via email

@ahmedabu98
Copy link
Contributor

We only have one test case that indirectly tests for multiple stream appends, and it uses streaming mode (while this error occurs in batch):

public void updateTableSchemaTest(boolean useSet) throws Exception {
assumeTrue(useStreaming);
assumeTrue(useStorageApi);
// Make sure that GroupIntoBatches does not buffer data.
p.getOptions().as(BigQueryOptions.class).setStorageApiAppendThresholdBytes(1);

@kennknowles
Copy link
Member

Yea let's add a batch test that reproduces the issue at the commit before the fix, and confirms it fixed. Keep the bug open until the test is in.

jrmccluskey pushed a commit that referenced this pull request May 4, 2023
@ahmedabu98
Copy link
Contributor

I'll write up the test and tag @reuvenlax to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants