Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Pre-existing Tablet Controls breaks MoveTables SwitchTraffic #13999

Closed
FancyFane opened this issue Sep 15, 2023 · 2 comments
Closed

Comments

@FancyFane
Copy link
Collaborator

Overview of the Issue

When there are pre-populated tablet controls on the target keyspace, MoveTables SwitchTraffic will break with an error that requires manual cleanup before reads and writes can resume. This occurs, when the TabletControls has a list of denied tables rules that don't match the currently running workflow. If the workflow's tables don't match the TabletControls 1 for 1; then an error results.

Any traffic sent after this point will result in continued errors from the application until we removed the TabletControls and Refreshed the Shard State.

Related Issue: #13998

Reproduction Steps

  1. Do a MoveTables with 6 sbtest databases; SwitchTraffic, ReverseTraffic; then cancel the workflow. This will result in an environment with Tablet Controls in place on the target and no running workflow.

See Issue: #13998

$ vtctlclient --server :15999 GetShard fane_import_sharded/-80
{
...
  "tablet_controls": [
    {
      "tablet_type": 1,
      "cells": [],
      "denied_tables": [
        "sbtest1",
        "sbtest2",
        "sbtest3",
        "sbtest4",
        "sbtest5",
        "sbtest6",
        "testing"
      ],
...
}
  1. Add two new sbtest tables on your source; and start up a new workflow; NOTE when you see the matching tables you'll see tables sbtest1-8; however, the tablet controls are only for sbtest1-6.
$ vtctlclient --server :15999 Workflow fane_import_sharded.import-shard-80 show
{
	"Workflow": "import-shard-80",
	"SourceLocation": {
		"Keyspace": "fane_import_sharded_source",
		"Shards": [
			"-80"
		]
	},
	"TargetLocation": {
		"Keyspace": "fane_import_sharded",
		"Shards": [
			"-80"
		]
	},
	"MaxVReplicationLag": 1,
	"MaxVReplicationTransactionLag": 1,
	"Frozen": false,
	"ShardStatuses": {
		"-80/aws_useast1a_6-3337899395": {
			"PrimaryReplicationStatuses": [
				{
					"Shard": "-80",
					"Tablet": "aws_useast1a_6-3337899395",
					"ID": 6,
					"Bls": {
						"keyspace": "fane_import_sharded_source",
						"shard": "-80",
						"filter": {
							"rules": [
								{
									"match": "sbtest1",
									"filter": "select * from sbtest1 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest2",
									"filter": "select * from sbtest2 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest3",
									"filter": "select * from sbtest3 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest4",
									"filter": "select * from sbtest4 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest5",
									"filter": "select * from sbtest5 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest6",
									"filter": "select * from sbtest6 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest7",
									"filter": "select * from sbtest7 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "sbtest8",
									"filter": "select * from sbtest8 where in_keyrange(id, 'fane_import_sharded.hash', '-80')"
								},
								{
									"match": "testing",
									"filter": "select * from testing"
								}
							]
						}
					},
					"Pos": "7c3368f8-5412-11ee-8179-0a26551b1c25:1-1584,7c434390-5412-11ee-8c60-0a26551b1c25:1",
					"StopPos": "",
					"State": "Running",
					"DBName": "fane_import_sharded",
					"TransactionTimestamp": 0,
					"TimeUpdated": 1694815788,
					"TimeHeartbeat": 1694815788,
					"TimeThrottled": 0,
					"ComponentThrottled": "",
					"Message": "",
					"Tags": "",
					"WorkflowType": "MoveTables",
					"WorkflowSubType": "Partial",
					"CopyState": null,
					"RowsCopied": 0
				}
			],
			"TabletControls": [
				{
					"tablet_type": 1,
					"denied_tables": [
						"sbtest1",
						"sbtest2",
						"sbtest3",
						"sbtest4",
						"sbtest5",
						"testing"
					]
				}
			],
			"PrimaryIsServing": true
		}
	},
	"SourceTimeZone": "",
	"TargetTimeZone": ""
}
  1. Performing a SwitchTraffic fails:
$ vtctlclient --server :15999 MoveTables SwitchTraffic fane_import_sharded.import-shard-80     
E0915 22:10:10.097662     696 main.go:96] E0915 22:10:10.097104 traffic_switcher.go:625] allowTargetWrites failed: Code: INVALID_ARGUMENT
cannot remove tables since one or more do not exist in the denylist
E0915 22:10:10.114269     696 main.go:96] E0915 22:10:10.113676 vtctl.go:2215] 
cannot remove tables since one or more do not exist in the denylist

The following vreplication streams exist for workflow fane_import_sharded.import-shard-80:

id=6 on -80/aws_useast1a_6-3337899395: Status: Stopped. VStream Lag: 0s.

MoveTables Error: rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist
E0915 22:10:10.216399     696 main.go:105] remote error: rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist
  1. Any writes done to the keyspace from the application during this time results in an error:
$ sysbench --db-driver=mysql --threads=1 --events=0 --time=0 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=fane_import_sharded /usr/share/sysbench/oltp_insert.lua --tables=5 run
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Initializing worker threads...

Threads started!

FATAL: mysql_drv_query() returned error 1105 (target: fane_import_sharded_source.-80.primary: vttablet: rpc error: code = FailedPrecondition desc = disallowed due to rule: enforce denied tables (CallerID: admin)) for query 'INSERT INTO sbtest4 (id, k, c, pad) VALUES (0, 4098, '09169823527-14773847787-63328771402-43563606289-98835554319-17838113855-09276254645-46412092895-40264640011-92712584350', '67793249909-86081288100-12979568721-26815841297-77951231372')'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_insert.lua:61: SQL error, errno = 1105, state = 'HY000': target: fane_import_sharded_source.-80.primary: vttablet: rpc error: code = FailedPrecondition desc = disallowed due to rule: enforce denied tables (CallerID: admin)

Recovery Steps

  1. (recovery step) The way to recovery here is to remove the tablet controls and refresh the shard state on the SOURCE:
vtctldclient --server localhost:15999 SetShardTabletControl --remove fane_import_sharded_source/-80 primary; 
vtctldclient --server localhost:15999 RefreshStateByShard fane_import_sharded_source/-80;
  1. (recovery step) Now any writes from the application will continue to run.
$ sysbench --db-driver=mysql --threads=1 --events=0 --time=0 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=fane_import_sharded /usr/share/sysbench/oltp_insert.lua --tables=5 run
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Initializing worker threads...

Threads started!

Binary Version

Vitess 16.0.3

Operating System and Environment details

n/a

Log Fragments

n/a
@FancyFane FancyFane added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged labels Sep 15, 2023
@frouioui frouioui added Component: VReplication and removed Needs Triage This issue needs to be correctly labelled and triaged labels Sep 15, 2023
@frouioui
Copy link
Member

@vitessio/vreplication

@rohit-nayak-ps rohit-nayak-ps changed the title Bug Report: Pre-exisiting Tablet Controls breaks MoveTables SwitchTraffic Bug Report: Pre-existing Tablet Controls breaks MoveTables SwitchTraffic Sep 28, 2023
@rohit-nayak-ps
Copy link
Contributor

Fixed via #14008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants