Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bigquery/storage/managedwriter): change AppendRows behavior #4729

Merged
merged 2 commits into from
Sep 8, 2021

Conversation

shollyman
Copy link
Contributor

@shollyman shollyman commented Sep 7, 2021

Previously, AppendRows() on a managed stream would return one
AppendResult per data row. This change instead switches the
behavior to return a single AppendResult for tracking the behavior
of the set of rows.

The original per-row contract was done in expectation that we'd
consider making batching decisions at a very granular level. However,
at this point it seems reasonable to consider only batching multiple
appends, not dividing individual batches more granularly.

From stress testing, we know we're able to push 50k RPS on a single stream
currently, issuing batches at roughly 1.5 batches per sec. This change means
that rather than creating 50k AppendResults per sec (and associated channels etc)
that we're instead creating only a single appendresult every 1.5 secs. Early numbers
indicates that this change improves stresstest throughput by 5-15% depending on the metric
you're using (rows/batches/bytes), in addition to compute/memory overhead savings.

BREAKING CHANGE: managedwriter AppendRows now returns a single AppendResponse for the whole append rather than one per row.

behavior

Previously, AppendRows() on a managed stream would return one
AppendResult per data row.  This change instead switches the
behavior to return a single AppendResult for tracking the behavior
of the set of rows.

The original per-row contract was done in expectation that we'd
consider making batching decisions are a very granular level. However,
at this point it seems reasonable to consider only batching multiple
appends, not dividing individual batchs more granularly.
@shollyman shollyman requested a review from a team September 7, 2021 17:32
@shollyman shollyman requested a review from a team as a code owner September 7, 2021 17:32
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Sep 7, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Sep 7, 2021
@shollyman shollyman changed the title BREAKING CHANGE(bigquery/storage/managedwriter): change AppendRows behavior refactor(bigquery/storage/managedwriter): change AppendRows behavior Sep 7, 2021
@shollyman shollyman merged commit 9c9fbb2 into googleapis:master Sep 8, 2021
@shollyman shollyman deleted the single-appendresult branch September 8, 2021 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants