VReplication: Improve query buffering behavior during MoveTables traffic switching #15701

mattlord · 2024-04-11T13:01:48Z

Description

This PR is the result of investigating #15707 and going through the full call path for how we handle keyspace events and specifically how they are used to detect a MoveTables switch traffic operation — both when it starts and when it finishes — and how queries are handled during this time. The main changes are:

Tests
1. Increasing the QPS for our endtoend test load generator — taking it from ~ 500QPS to ~ 10,000 QPS in local tests — so that I could repeat some of the problems
2. Switching traffic back and forth much more frequently
3. Waiting between switches to be sure we catch any errant multiple buffer windows resulting from a single traffic switch
Code
1. Changing how keyspace events are processed to ensure we don't ever miss/skip any in a vtgate
2. Preventing the query plan cache from being polluted by a query plan that went to the wrong side of the switch (the before side rather than the after) concurrently with the keyspace event being processed and resolved (and thus draining the buffers)
3. Improving the performance of processing a keyspace event when there is an active MoveTables workflow by getting topo information in parallel
4. Only performing a single RebuildSrvVSchema operation — which kicks off the keyspace event which is supposed to end the buffering window — per traffic switch (previously if you were switching all traffic we did it twice, once for reads and once for writes)
5. Base the query retry wait times on the buffering configuration so that we don't have queries regularly waiting far beyond the buffer window and we typically retry the max times within the window
6. Make 1s the minimum for --buffer_min_time_between_failovers to ensure that we don't start a new buffering window concurrently with resolving the keyspace event, which can result in an errant second buffering window to start while resolving the first — which resulted in a lot of additional query latency and errors along with leaving a bad query plan in the cache which then caused future queries with the same plan to go the wrong side and thus continuously kick off more buffering windows which hit the max window because there's not another keyspace event to resolve them. This change addresses the first part of this issue with the second/concurrent buffer window.
  - When doing a query retry because a MoveTables traffic switch was detected, we always clear the plan cache after retrying when we were not successful on retry. This prevents the second part of the given query shape then failing indefinitely due to the cached bad plan which also then continuously kicked off more buffering windows. (This is item number 2 in the list.)

Related Issue(s)

Fixes: #15707

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation: Add docs on VTGate buffering for MoveTables and Reshard traffic switching website#1733

Signed-off-by: Matt Lord <mattalord@gmail.com>

vitess-bot · 2024-04-11T13:01:51Z

Signed-off-by: Matt Lord <mattalord@gmail.com>

codecov · 2024-04-11T15:01:13Z

Codecov Report

Attention: Patch coverage is 31.52174% with 63 lines in your changes are missing coverage. Please review.

Project coverage is 68.42%. Comparing base (4c2df48) to head (7c8d572).

Files	Patch %	Lines
go/vt/discovery/keyspace_events.go	22.22%	35 Missing ⚠️
go/vt/vtgate/plan_execute.go	57.14%	9 Missing ⚠️
go/vt/topo/etcd2topo/watch.go	60.00%	4 Missing ⚠️
go/vt/vtctl/workflow/traffic_switcher.go	0.00%	4 Missing ⚠️
go/vt/vtctl/workflow/server.go	0.00%	3 Missing ⚠️
go/vt/vtctl/workflow/switcher_dry_run.go	0.00%	3 Missing ⚠️
go/vt/vtctl/workflow/switcher.go	0.00%	2 Missing ⚠️
go/vt/vtgate/buffer/buffer.go	0.00%	2 Missing ⚠️
go/vt/vtgate/buffer/flags.go	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15701      +/-   ##
==========================================
- Coverage   68.44%   68.42%   -0.02%     
==========================================
  Files        1558     1558              
  Lines      195822   195865      +43     
==========================================
- Hits       134025   134024       -1     
- Misses      61797    61841      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Matt Lord <mattalord@gmail.com>

deepthi

@harshit-gangal can you please review the changes to plan_execute.go?

Rest mostly LGTM. I have some minor questions and comments. Once we have harshit's review and your responses, I can review once again.

go/vt/discovery/keyspace_events.go

deepthi · 2024-04-17T05:30:43Z

go/vt/topo/etcd2topo/watch.go

-		Version:  EtcdVersion(initial.Kvs[0].ModRevision),
+		Version:  EtcdVersion(initial.Kvs[0].Version),


What is the difference between these two?

Revision, ModRevision, and Version are related but distinct things:

what is different about Revision, ModRevision and Version? etcd-io/etcd#6518

https://etcd.io/docs/v3.5/learning/data_model/

I'll try to summarize:

Revision: a logical clock used to track changes to the etcd keyspace

Version: a logical clock used to track the changes to an individual key since it was created

ModRevision: the etcd keyspace revision number for the most recent update to an individual key (the latest version)

What I tried to do is align the Vitess comments and code so that we're matching etcd for which one of these to use. As you can see in this example here, we were using Version in Vitess code and setting it to the ModRevision value from etcd. When I first started the investigation I thought that, aside from this being odd, it could have perhaps led us to miss/skip an update to the key in the watch.

deepthi · 2024-04-17T05:31:37Z

go/vt/topo/etcd2topo/watch.go

-		var currVersion = initial.Header.Revision
+		var rev = initial.Header.Revision


Is there any other actual diff in this file? Everything else looks cosmetic.

No, just renaming things to be more precise and clear: https://github.com/vitessio/vitess/pull/15701/files/f1fed6dd787feea26a4f4e13f660e8eaa91e64bb#r1568995668

go/vt/vtctl/workflow/server.go

go/vt/vtctl/workflow/switcher_dry_run.go

go/vt/vtgate/buffer/flags.go

rohit-nayak-ps

Looks good, nice work!

I had just one point where I was not clear about the functionality: where we clear the plan cache.

rohit-nayak-ps · 2024-04-19T12:30:32Z

go/vt/discovery/keyspace_events.go

 func (kew *KeyspaceEventWatcher) Subscribe() chan *KeyspaceEvent {
 	kew.subsMu.Lock()
 	defer kew.subsMu.Unlock()
-	c := make(chan *KeyspaceEvent, 2)
+	// Use a decent size buffer to:


We can probably indeed process the most recent version, discarding the rest, We can revisit this separately in the context of multi-tenant migrations, for example, where we may be switching several per-tenant workflows per second for smaller tenants.

rohit-nayak-ps · 2024-04-19T12:49:41Z

go/vt/vtgate/plan_execute.go

 		rootCause := vterrors.RootCause(err)
 		if rootCause != nil && strings.Contains(rootCause.Error(), "enforce denied tables") {
 			log.V(2).Infof("Retry: %d, will retry query %s due to %v", try, query, err)
-			lastVSchemaCreated = vs.GetCreated()
+			if try == 0 { // We are going to retry at least once


Can you explain why this deferred clearing of plans when the executor.newExecutor() returns helps?

When we retry, we wait for a newer vschema. Shouldn't we clear the plans before executing the query with the new vschema.

Can you explain why this deferred clearing of plans when the executor.newExecutor() returns helps?

As noted in the description, this is the part that prevented the query shape from failing indefinitely and causing a never-ending cycle of keyspace events when there was a ~ concurrent buffer event.

The executor's VSchemaManager clears the plan cache when it receives a new vschema via its SrvVSchema watcher (it calls executor.SaveVSchema() in its watch's subscriber callback). This happens concurrently with the KeyspaceEventWatcher also receiving the new vschema in its SrvVSchema watcher and in its subscriber callback processing it (which includes getting info on all shards from the topo), and eventually determining that the keyspace is consistent and ending the buffering window. So there was a race with query retries such that a query could be planned against the wrong side just as the keyspace event is getting resolved and the buffers drained. Then that bad plan is the cached plan for the query until you do another topo.RebuildSrvVSchema/vtctldclient RebuildVSchemaGraph which then causes the VSchemaManager to clear the plan cache. It's essentially a race between the two SrvVSchema watchers and the work they do when a new one is received.

When we retry, we wait for a newer vschema. Shouldn't we clear the plans before executing the query with the new vschema.

As I noted above, the VSchemaManager already clears the plan cache when a new vschema is received. We wait for a newer vschema, but not indefinitely as the wait times out and we retry without it. So if we timeout just as the new vschema is coming in, we could plan with the old vschema, putting that into the just cleared plan cache for all subsequent executions. Do you remember when you, @harshit-gangal, and I were discussing this on Zoom? That's where this idea originated as you and I were both confused by how the query failures continued even after the buffering window ended due to hitting the max duration.

Since we're now making 1s the minimum for time between failovers the likelihood of hitting this should be drastically reduced, but since the results of this happening are worse than if we'd not done any buffering at all, I think it's better to be extra safe here when doing a retry. We should do it VERY rarely overall, and when we do, let's just be sure to prevent this issue from ever happening since it's fairly catastrophic and it's not at all clear to the user what is happening and how to clear it out (vtctldclient RebuildVSchemaGraph) w/o just restarting the vtgate. That was my thinking. If we DID a retry AND the last time we retried (the 3rd try) still encountered an error, we know that the plan used was 1) probably not valid/correct as it failed 2) will be the plan in the cache if we do not clear the plans after it was added to to the cache.

Make sense?

I improved the behavior and comments here: 7675cf8

Sorry for not explaining this more clearly ahead of time! 🙏

This issue was clarified in a call: my misunderstanding was because I was assuming a closure in the check for the err in defer. However err is global to the function and we will not clear the cache if the query succeeds. Apologies for the misunderstanding and thanks for the additional comments ...

This is good.

rohit-nayak-ps · 2024-04-19T12:51:40Z

go/vt/vtgate/plan_execute.go

@@ -64,11 +64,11 @@ func (e *Executor) newExecute(
 	logStats *logstats.LogStats,
 	execPlan planExec, // used when there is a plan to execute
 	recResult txResult, // used when it's something simple like begin/commit/rollback/savepoint
-) error {
-	// 1: Prepare before planning and execution
+) (err error) {


Any reason why we have error as a named parameter here? That is not our usual normal practice.

To be sure that we're checking the returned error in the defer.

Signed-off-by: Matt Lord <mattalord@gmail.com>

harshit-gangal · 2024-04-23T06:45:15Z

go/vt/vtgate/plan_execute.go

-			if waitForNewerVSchema(ctx, e, lastVSchemaCreated) {
+			// Without a wait we fail non-deterministically since the previous vschema will not have
+			// the updated routing rules.
+			timeout := e.resolver.scatterConn.gateway.buffer.GetConfig().MaxFailoverDuration / MaxBufferingRetries


could you explain why this timeout is calculated in this way?

We retry 2 times before giving up. How long we wait before each retry — IF we don't see a newer vschema come in — was previously hardcoded at 30s while the max buffer failover duration defaults to 20s. I thought it made more sense to instead base the wait time on how long the buffering window is.

My thinking for the calculation was that this way we should be able to perform the max retries within the given window of time for many queries (certainly not all) and we should not end up waiting too long after the buffer window has ended, retrying old queries.

Before we could end up waiting 60 seconds to retry 2 times and continuing to retry when the buffering window ended 30+ seconds ago. That's of course assuming that we didn't receive a newer vschema.

Thinking about it again... I think this improves some failure scenarios/conditions but doesn't make the typical happy path much better as there's no point in retrying when we haven't gotten a new vschema UNLESS the traffic switch failed and we backed out and those queries should now succeed with the original targets/routing. Let's discuss tomorrow. I'm happy to change this... and should add a comment either way. 🙂

Noting that we discussed this directly and were both OK with the calculation. I will add a comment, however, to describe the reasoning (which I should have already done).

harshit-gangal · 2024-04-23T07:04:14Z

go/vt/discovery/keyspace_events.go

-		select {
-		case c <- th:
-		default:
-		}
+		c <- ev
 	}


this change ensures that broadcast happens.

go/vt/discovery/keyspace_events.go

harshit-gangal

Changes look good overall, some clarification questions

Signed-off-by: Matt Lord <mattalord@gmail.com>

…fic switching (vitessio#15701) Signed-off-by: Matt Lord <mattalord@gmail.com>

Improve keyspace event handling for MoveTables

436a883

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord added Type: Bug Component: VReplication labels Apr 11, 2024

github-actions bot added this to the v20.0.0 milestone Apr 11, 2024

mattlord removed NeedsWebsiteDocsUpdate What it says NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Apr 11, 2024

Require buffer min time between failovers to be at least 1s

61a0fe5

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the ks_events_improvements branch from 3424b0b to 61a0fe5 Compare April 11, 2024 14:41

Base vschema waiting on buffering config

3d634f2

Signed-off-by: Matt Lord <mattalord@gmail.com>

Minor tweaks

7063a15

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the ks_events_improvements branch 4 times, most recently from fddaf21 to be18fc1 Compare April 12, 2024 14:51

Update e2e test

1f81b8a

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the ks_events_improvements branch from be18fc1 to 1f81b8a Compare April 12, 2024 16:31

mattlord mentioned this pull request Apr 13, 2024

Bug Report: VTGate persistent query errors after MoveTables SwitchTraffic #15707

Closed

mattlord removed NeedsIssue A linked issue is missing for this Pull Request NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Apr 13, 2024

Minor changes after self review

8f72116

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the ks_events_improvements branch from fe9556b to 8f72116 Compare April 15, 2024 15:19

mattlord marked this pull request as ready for review April 15, 2024 15:19

mattlord requested a review from harshit-gangal as a code owner April 15, 2024 15:19

mattlord requested a review from vmg April 15, 2024 15:19

mattlord added the NeedsWebsiteDocsUpdate What it says label Apr 15, 2024

Merge remote-tracking branch 'origin/main' into ks_events_improvements

f1fed6d

Signed-off-by: Matt Lord <mattalord@gmail.com>

deepthi reviewed Apr 17, 2024

View reviewed changes

rohit-nayak-ps reviewed Apr 19, 2024

View reviewed changes

mattlord added 3 commits April 19, 2024 12:12

Further limit plan cache clearing on retry and improve comments

7675cf8

Signed-off-by: Matt Lord <mattalord@gmail.com>

Nittiest possible of nits

805d7b2

Signed-off-by: Matt Lord <mattalord@gmail.com>

Add another comment about the error val checked in the defer

44cd42d

Signed-off-by: Matt Lord <mattalord@gmail.com>

rohit-nayak-ps approved these changes Apr 19, 2024

View reviewed changes

harshit-gangal reviewed Apr 23, 2024

View reviewed changes

go/vt/discovery/keyspace_events.go

select {

case c <- th:

default:

}

c <- ev

}

Copy link

Member

harshit-gangal Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change ensures that broadcast happens.

harshit-gangal reviewed Apr 23, 2024

View reviewed changes

go/vt/discovery/keyspace_events.go Outdated Show resolved Hide resolved

harshit-gangal approved these changes Apr 23, 2024

View reviewed changes

mattlord added 4 commits April 24, 2024 12:52

Address review comments

0e8a716

Signed-off-by: Matt Lord <mattalord@gmail.com>

Add note to release summary doc

7e3de48

Signed-off-by: Matt Lord <mattalord@gmail.com>

Merge remote-tracking branch 'origin/main' into ks_events_improvements

88d3ba5

Signed-off-by: Matt Lord <mattalord@gmail.com>

Minor adjustment after merging in origin/main

7c8d572

Signed-off-by: Matt Lord <mattalord@gmail.com>

deepthi approved these changes Apr 25, 2024

View reviewed changes

mattlord removed the NeedsWebsiteDocsUpdate What it says label Apr 25, 2024

mattlord merged commit ee6b837 into vitessio:main Apr 25, 2024
104 of 105 checks passed

mattlord deleted the ks_events_improvements branch April 25, 2024 16:58

mattlord mentioned this pull request May 6, 2024

Etcd2Topo: Use node's ModRevision consistently for in-memory topo.Version value #15847

Merged

5 tasks

mattlord added Backport to: release-18.0 Backport to: release-19.0 Needs to be back ported to release-19.0 and removed Backport to: release-18.0 Backport to: release-19.0 Needs to be back ported to release-19.0 labels May 15, 2024

arthurschreiber pushed a commit to github/vitess-gh that referenced this pull request Nov 8, 2024

VReplication: Improve query buffering behavior during MoveTables traf…

5d2abca

…fic switching (vitessio#15701) Signed-off-by: Matt Lord <mattalord@gmail.com>

arthurschreiber mentioned this pull request Nov 8, 2024

Buffering related backports github/vitess-gh#128

Merged

3 tasks

arthurschreiber pushed a commit to github/vitess-gh that referenced this pull request Nov 19, 2024

VReplication: Improve query buffering behavior during MoveTables traf…

381cf4d

…fic switching (vitessio#15701) Signed-off-by: Matt Lord <mattalord@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VReplication: Improve query buffering behavior during MoveTables traffic switching #15701

VReplication: Improve query buffering behavior during MoveTables traffic switching #15701

mattlord commented Apr 11, 2024 •

edited

Loading

vitess-bot bot commented Apr 11, 2024

codecov bot commented Apr 11, 2024 •

edited

Loading

deepthi left a comment

deepthi Apr 17, 2024

mattlord Apr 17, 2024 •

edited

Loading

deepthi Apr 25, 2024

deepthi Apr 17, 2024

mattlord Apr 17, 2024

rohit-nayak-ps left a comment

rohit-nayak-ps Apr 19, 2024

rohit-nayak-ps Apr 19, 2024

mattlord Apr 19, 2024 •

edited

Loading

mattlord Apr 19, 2024

rohit-nayak-ps Apr 19, 2024

harshit-gangal Apr 23, 2024

rohit-nayak-ps Apr 19, 2024

mattlord Apr 19, 2024

harshit-gangal Apr 23, 2024

mattlord Apr 23, 2024 •

edited

Loading

mattlord Apr 24, 2024

harshit-gangal Apr 23, 2024

harshit-gangal left a comment

		Version: EtcdVersion(initial.Kvs[0].ModRevision),
		Version: EtcdVersion(initial.Kvs[0].Version),

		var currVersion = initial.Header.Revision
		var rev = initial.Header.Revision

VReplication: Improve query buffering behavior during MoveTables traffic switching #15701

VReplication: Improve query buffering behavior during MoveTables traffic switching #15701

Conversation

mattlord commented Apr 11, 2024 • edited Loading

Description

Related Issue(s)

Checklist

vitess-bot bot commented Apr 11, 2024

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov bot commented Apr 11, 2024 • edited Loading

Codecov Report

deepthi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattlord Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattlord Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattlord Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harshit-gangal left a comment

Choose a reason for hiding this comment

mattlord commented Apr 11, 2024 •

edited

Loading

codecov bot commented Apr 11, 2024 •

edited

Loading

mattlord Apr 17, 2024 •

edited

Loading

mattlord Apr 19, 2024 •

edited

Loading

mattlord Apr 23, 2024 •

edited

Loading