Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracking session state changes for transaction start #11061

Merged

Conversation

dbussink
Copy link
Contributor

This logic depends on the custom option that PlanetScale as in it's MySQL fork to get the current committed GTID at the start of a transaction snapshot.

With this GTID, we can optimize vreplication to avoid having to lock tables around vreplication operations.

See planetscale/mysql-server@9fa67ec for the change in MySQL that is published in our fork.

The change here allows getting back this value through any query operation that might start a transaction.

With this change, we can essentially do what was attempted in #9991 but it will guarantee now a consistent GTID for the snapshot.

The first commit here wires up passing through the session state changes on transaction start, the second commit starts using it for VReplication.

Related Issue(s)

See #9991 for a previous failed attempt.

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

@dbussink dbussink added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication labels Aug 22, 2022
@dbussink dbussink requested a review from shlomi-noach August 22, 2022 08:49
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Aug 22, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive. Additionally, flag names should use dashes (-) as word separators rather than underscores (_).
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo! A significant performance/feasibility improvement.

One thing is that we want to think about how to specifically validate the new behavior coupled with the mysql-server patch, in CI.

@dbussink
Copy link
Contributor Author

One thing is that we want to think about how to specifically validate the new behavior coupled with the mysql-server patch, in CI.

We also run a number of CI tests internally in PlanetScale and I think that that's the place where we would run against our MySQL fork as well to validate this behavior there too.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Aug 22, 2022

Recap on how this is being tested: in this PR Vitess queries the MySQL server to test whether it supports session_track_gtids = START_GTID. Vanilla MySQL does not support it, and https://github.com/planetscale/mysql-server/ does.

This repo, vitessio/vitess only runs against vanilla MySQL. Therefore, the CI in this repo only validates the original code path, where consistent snapshots are achieved with LOCK TABLES. The newly introduced code path can only be tested on the patched MySQL version. We happen to be running this version internally, and are working to make some of our internal tests work with the patched MySQL server.

The most relevant test, one that catches flaws in consistency logic, is onlineddl_vrepl_stress (mysql80). Here is an output from a dev env running the patched MySQL version:

...
Aug 22 09:20:29 # max op_order in table: 16332
Aug 22 09:20:29 I0822 09:20:29.322591 4027053 onlineddl_vrepl_mini_stress_test.go:571] WriteMetrics: inserts-deletes=3078, updates-deletes=3471,
Aug 22 09:20:29 insertsAttempts=8684, insertsFailures=0, insertsNoops=4573, inserts=4111,
Aug 22 09:20:29 updatesAttempts=7649, updatesFailures=0, updatesNoops=3145, updates=4504,
Aug 22 09:20:29 deletesAttempts=7557, deletesFailures=0, deletesNoops=6524, deletes=1033,
Aug 22 09:20:29 I0822 09:20:29.327096 4027053 onlineddl_vrepl_mini_stress_test.go:583] testSelectTableMetrics, row: map[num_rows:INT64(3078) sum_updates:INT64(3471)]
Aug 22 09:20:29 --- PASS: TestSchemaChange (263.69s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/create_schema (0.13s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/init_table_1/5 (12.16s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/init_table_1/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/init_table_2/5 (12.69s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/init_table_2/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/init_table_3/5 (12.81s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/init_table_3/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/init_table_4/5 (12.18s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/init_table_4/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/init_table_5/5 (12.19s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/init_table_5/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_1/5 (16.71s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_1/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_2/5 (17.61s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_2/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_3/5 (16.99s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_3/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_4/5 (17.26s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_4/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_5/5 (17.08s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/workload_without_ALTER_TABLE_5/5/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/ALTER_TABLE_without_workload (19.27s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_without_workload/cancel_pending_migrations (0.00s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_1/5 (19.25s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_1/5/create_schema (0.11s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_1/5/init_table (12.08s)
Aug 22 09:20:29             --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_1/5/init_table/cancel_pending_migrations (0.00s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_1/5/migrate (7.05s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_1/5/validate_metrics (0.01s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_2/5 (19.06s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_2/5/create_schema (0.11s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_2/5/init_table (11.89s)
Aug 22 09:20:29             --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_2/5/init_table/cancel_pending_migrations (0.00s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_2/5/migrate (7.05s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_2/5/validate_metrics (0.01s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_3/5 (19.72s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_3/5/create_schema (0.16s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_3/5/init_table (12.50s)
Aug 22 09:20:29             --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_3/5/init_table/cancel_pending_migrations (0.00s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_3/5/migrate (7.05s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_3/5/validate_metrics (0.01s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_4/5 (19.23s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_4/5/create_schema (0.11s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_4/5/init_table (12.06s)
Aug 22 09:20:29             --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_4/5/init_table/cancel_pending_migrations (0.00s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_4/5/migrate (7.04s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_4/5/validate_metrics (0.01s)
Aug 22 09:20:29     --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_5/5 (19.37s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_5/5/create_schema (0.12s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_5/5/init_table (12.19s)
Aug 22 09:20:29             --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_5/5/init_table/cancel_pending_migrations (0.00s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_5/5/migrate (7.05s)
Aug 22 09:20:29         --- PASS: TestSchemaChange/ALTER_TABLE_with_workload_5/5/validate_metrics (0.01s)
Aug 22 09:20:29 PASS
Aug 22 09:20:33 ok  	vitess.io/vitess/go/test/endtoend/onlineddl/vrepl_stress	307.722s
Aug 22 09:20:33 2022/08/22 09:20:33 local.onlineddl_vrepl_stress: PASSED in 5m29.2s
Aug 22 09:20:33 2022/08/22 09:20:33 ============================================================
Aug 22 09:20:33 2022/08/22 09:20:33 local.onlineddl_vrepl_stress            	PASS
Aug 22 09:20:33 2022/08/22 09:20:33 ============================================================
Aug 22 09:20:33 2022/08/22 09:20:33 1 PASSED, 0 FLAKY, 0 FAILED, 0 SKIPPED
Aug 22 09:20:33 2022/08/22 09:20:33 Total time: 6m22s

dbussink and others added 2 commits August 22, 2022 12:02
This logic depends on the custom option that PlanetScale as in it's
MySQL fork to get the current committed GTID at the start of a
transaction snapshot.

With this GTID, we can optimize vreplication to avoid having to lock
tables around vreplication operations.

See
planetscale/mysql-server@9fa67ec
for the change in MySQL that is published in our fork.

The change here allows getting back this value through any query
operation that might start a transaction.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
… tables while still getting a transaction consistent GTID

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@dbussink dbussink force-pushed the dbussink/allow-using-new-start-gtid branch from f88af23 to ad6989e Compare August 22, 2022 10:02
@shlomi-noach shlomi-noach merged commit ed9bf24 into vitessio:main Aug 22, 2022
@shlomi-noach shlomi-noach deleted the dbussink/allow-using-new-start-gtid branch August 22, 2022 11:39
@dbussink dbussink mentioned this pull request Sep 6, 2022
3 tasks
notfelineit pushed a commit to planetscale/vitess that referenced this pull request Sep 21, 2022
) (vitessio#990)

* Add tracking session state changes for transaction start

This logic depends on the custom option that PlanetScale as in it's
MySQL fork to get the current committed GTID at the start of a
transaction snapshot.

With this GTID, we can optimize vreplication to avoid having to lock
tables around vreplication operations.

See
planetscale/mysql-server@9fa67ec
for the change in MySQL that is published in our fork.

The change here allows getting back this value through any query
operation that might start a transaction.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Use 'session_track_gtids = START_GTID' where possible, and avoid LOCK tables while still getting a transaction consistent GTID

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants