Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vreplication: Materialize stalls when unique field left out #11991

Closed
derekperkins opened this issue Dec 19, 2022 · 2 comments · Fixed by #13893
Closed

vreplication: Materialize stalls when unique field left out #11991

derekperkins opened this issue Dec 19, 2022 · 2 comments · Fixed by #13893

Comments

@derekperkins
Copy link
Member

I was using Materialize to make a copy of a small table in another keyspace. I accidentally forgot to include slug, which is a unique index on both the source and target, but workspace_id is the PK, so I wouldn't expect any issues. When I ran vtctlclient Workflow iam.iam__workspaces__copy show, it returned "State": "Error", but "Message": "Picked source tablet: cell:\"uscentral1\" uid:699290100" instead of an error message. There were no error messages in tablet logs on either side, but there were logs showing that the stream had started on the source tablet.

My not very informed hunch is that the Materialize unique/PK picker chose the unique slug column because it saw that before the PK workspace_id in the source table, but slug wasn't being copied to the target table, causing the copy to stall indefinitely without erroring. When I added slug back to the select query, it ran perfectly.

This same behavior exhibited on both v15.0.0 and v15.0.2

source table

CREATE TABLE `workspaces` (
  `workspace_id` bigint NOT NULL,
  `version_id` bigint NOT NULL,
  `name` varbinary(255) NOT NULL,
  `slug` varbinary(16) NOT NULL,
  `hubspot_company_id` bigint DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`workspace_id`),
  UNIQUE KEY `workspaces_slug__uq` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

target table

CREATE TABLE `iam__workspaces__copy` (
  `workspace_id` bigint NOT NULL,
  `version_id` bigint NOT NULL,
  `name` varbinary(255) NOT NULL,
  `slug` varbinary(16) NOT NULL,
  `hubspot_company_id` bigint DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`workspace_id`),
  UNIQUE KEY `workspaces_slug__uq` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

stalled Materialize (slug EXCLUDED)

 vtctlclient Materialize '{
  "workflow": "iam__workspaces__copy",
  "source_keyspace": "workspaces",
  "target_keyspace": "iam",
  "table_settings": [
    {
      "target_table": "iam__workspaces__copy",
      "source_expression": "select workspace_id, version_id, name, hubspot_company_id, created_at, updated_at, deleted_at from workspaces"
    }
  ],
  "tablet_types": "REPLICA"
}'

working Materialize (slug INCLUDED)

vtctlclient Materialize '{
  "workflow": "iam__workspaces__copy",
  "source_keyspace": "workspaces",
  "target_keyspace": "iam",
  "table_settings": [
    {
      "target_table": "iam__workspaces__copy",
      "source_expression": "select workspace_id, version_id, name, slug, hubspot_company_id, created_at, updated_at, deleted_at from workspaces"
    }
  ],
  "tablet_types": "REPLICA"
}'

vtctlclient Workflow iam.iam__workspaces__copy show

notice State: Error, but the Message is that a tablet was selected

{
	"Workflow": "iam__workspaces__copy",
	"SourceLocation": {
		"Keyspace": "workspaces",
		"Shards": [
			"0"
		]
	},
	"TargetLocation": {
		"Keyspace": "iam",
		"Shards": [
			"0"
		]
	},
	"MaxVReplicationLag": 418,
	"MaxVReplicationTransactionLag": 1671481568,
	"Frozen": false,
	"ShardStatuses": {
		"0/uscentral1-1416046300": {
			"PrimaryReplicationStatuses": [
				{
					"Shard": "0",
					"Tablet": "uscentral1-1416046300",
					"ID": 8,
					"Bls": {
						"keyspace": "workspaces",
						"shard": "0",
						"filter": {
							"rules": [
								{
									"match": "iam__workspaces__copy",
									"filter": "select workspace_id, version_id, name, hubspot_company_id, created_at, updated_at, deleted_at from workspaces"
								}
							]
						}
					},
					"Pos": "22acb7ef-5e2e-11ea-8055-f68e2a07ff5c:1-1188536448,23e612e9-5e2f-11ea-ae47-e2928e017498:1-90433677,7106820a-e88c-11ea-ba90-6258029302fe:1-3978801218,c63402fa-c652-11ec-a20b-06e64b3746d3:1-1327820702,c6be3a55-6713-11e9-92ff-0a580a300906:1-10775216,c7e128c3-6713-11e9-933f-0a580a301704:1-202,d8b2a3c2-819d-11ec-9a1e-c69cb6174bd0:1-301196149,f503ab19-ca9b-11e9-bc0a-626c2195e13f:1-170430446",
					"StopPos": "",
					"State": "Error",
					"DBName": "iam",
					"TransactionTimestamp": 0,
					"TimeUpdated": 1671481150,
					"TimeHeartbeat": 0,
					"TimeThrottled": 0,
					"ComponentThrottled": "",
					"Message": "Picked source tablet: cell:\"uscentral1\" uid:699290100",
					"Tags": "",
					"WorkflowType": "Materialize",
					"WorkflowSubType": "None",
					"CopyState": [
						{
							"Table": "iam__workspaces__copy",
							"LastPK": ""
						}
					]
				}
			],
			"TabletControls": null,
			"PrimaryIsServing": true
		}
	},
	"SourceTimeZone": "",
	"TargetTimeZone": ""
}

https://vitess.slack.com/archives/CMKTCUYNQ/p1671481794563869

cc @mattlord

@mattlord
Copy link
Contributor

@derekperkins I think this would be fixed by: #13893

Here's a test case from that branch based on what you shared in this issue:

git checkout PlanetScale/rowstreamer_force_index
make build
pushd examples/local

./101_initial_cluster.sh

mysql commerce -e 'CREATE TABLE `workspaces` (
  `workspace_id` bigint NOT NULL,
  `version_id` bigint NOT NULL,
  `name` varbinary(255) NOT NULL,
  `slug` varbinary(16) NOT NULL,
  `hubspot_company_id` bigint DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`workspace_id`),
  UNIQUE KEY `workspaces_slug__uq` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED'

./201_customer_tablets.sh

mysql customer -e 'CREATE TABLE `iam__workspaces__copy` (
  `workspace_id` bigint NOT NULL,
  `version_id` bigint NOT NULL,
  `name` varbinary(255) NOT NULL,
  `slug` varbinary(16) NOT NULL,
  `hubspot_company_id` bigint DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`workspace_id`),
  UNIQUE KEY `workspaces_slug__uq` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED'


for uid in 100 101 102 200 201 202; do
  command mysql -u root --socket=${VTDATAROOT}/vt_0000000${uid}/mysql.sock -e 'set global general_log=ON'
done

vtctlclient Materialize '{
  "workflow": "iam__workspaces__copy",
  "source_keyspace": "commerce",
  "target_keyspace": "customer",
  "table_settings": [
    {
      "target_table": "iam__workspaces__copy",
      "source_expression": "select workspace_id, version_id, name, hubspot_company_id, created_at, updated_at, deleted_at from workspaces"
    }
  ],
  "tablet_types": "REPLICA"
}'

vtctlclient Workflow customer.iam__workspaces__copy show

grep -R "select workspace_id" ${VTDATAROOT}/vt_*/data/*.log

With the final output being:

$ grep -R "select workspace_id" ${VTDATAROOT}/vt_*/data/*.log
/opt/vtdataroot/vt_0000000100/data/pslord.log:2023-08-31T02:57:57.022870Z	   70 Query	select workspace_id, version_id, `name`, slug, hubspot_company_id, created_at, updated_at, deleted_at from workspaces force index (`PRIMARY`) order by workspace_id
/opt/vtdataroot/vt_0000000200/data/pslord.log:2023-08-31T02:57:56.988007Z	   98 Query	insert into _vt.vreplication(workflow, source, pos, max_tps, max_replication_lag, cell, tablet_types, time_updated, transaction_timestamp, state, db_name, workflow_type, workflow_sub_type, defer_secondary_keys) values ('iam__workspaces__copy', 'keyspace:\"commerce\" shard:\"0\" filter:{rules:{match:\"iam__workspaces__copy\" filter:\"select workspace_id, version_id, name, hubspot_company_id, created_at, updated_at, deleted_at from workspaces\"}}', '', 9223372036854775807, 9223372036854775807, '', '', 1693450676, 0, 'Stopped', 'vt_customer', 0, 0, false)
/opt/vtdataroot/vt_0000000200/data/pslord.log:2023-08-31T02:57:56.991257Z	   98 Query	insert into _vt.vreplication_log(vrepl_id, type, state, message) values(1, 'Stream Created', 'Stopped', '{\"cell\":\"\",\"component_throttled\":\"\",\"db_name\":\"vt_customer\",\"defer_secondary_keys\":\"0\",\"id\":\"1\",\"max_replication_lag\":\"9223372036854775807\",\"max_tps\":\"9223372036854775807\",\"pos\":\"\",\"rows_copied\":\"0\",\"source\":\"keyspace:\\\"commerce\\\" shard:\\\"0\\\" filter:{rules:{match:\\\"iam__workspaces__copy\\\" filter:\\\"select workspace_id, version_id, name, hubspot_company_id, created_at, updated_at, deleted_at from workspaces\\\"}}\",\"state\":\"Stopped\",\"tablet_types\":\"\",\"tags\":\"\",\"time_heartbeat\":\"0\",\"time_throttled\":\"0\",\"time_updated\":\"1693450676\",\"transaction_timestamp\":\"0\",\"workflow\":\"iam__workspaces__copy\",\"workflow_sub_type\":\"0\",\"workflow_type\":\"0\"}')

So this would ensure that we do use the proper/best index here.

Or do you think I'm missing or misunderstanding something here? Thanks!

@derekperkins
Copy link
Member Author

Thanks @mattlord!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants