Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: vtcombo's DBDDL plugin has a problematic race condition when creating databases #13116

Closed
jspawar opened this issue May 18, 2023 · 1 comment · Fixed by #13117
Closed

Comments

@jspawar
Copy link
Contributor

jspawar commented May 18, 2023

Overview of the Issue

vtcombo implementation of the DBDDL plugin's logic used for CREATE DATABASE queries via the CreateDatabase method and how it's used in conjunction with vtcombo has a race condition that can repeatably cause failures to actually create a database and associated vitess entities (e.g. tablets).

Specifically, vtcombo's implementation of globalCreateDb here tries to read and increment the uid variable which needs to be a unique value (for a given cell) which is a race condition as parallel attempts to create a database end up reading the same value for the uid variable and then try to create a tablet with the same UID which is not allowed.

Some example output indicating as such:

22:54:07.227162       7 syslogger.go:129] foobar_test_efDbWJRM [keyspace] created value:
22:54:07.227203       7 locks.go:245] Locking keyspace foobar_test_efDbWJRM for action CreateShard
22:54:07.227250       7 syslogger.go:129] foobar_test_efDbWJRM/0 [shard] created value: is_primary_serving:true
22:54:07.227255       7 locks.go:284] Unlocking keyspace foobar_test_efDbWJRM for successful action CreateShard
22:54:07.234657       7 tablet_map.go:90] Creating PRIMARY tablet test-0000000028 for foobar_test_efDbWJRM/0
...
22:54:07.257710       7 syslogger.go:129] foobar_test_zCKdvTpX [keyspace] created value:
22:54:07.258008       7 locks.go:245] Locking keyspace foobar_test_zCKdvTpX for action CreateShard
22:54:07.258269       7 syslogger.go:129] foobar_test_zCKdvTpX/0 [shard] created value: is_primary_serving:true
22:54:07.258384       7 locks.go:284] Unlocking keyspace foobar_test_zCKdvTpX for successful action CreateShard
...
22:54:07.278880       7 tablet_map.go:90] Creating PRIMARY tablet test-0000000028 for foobar_test_zCKdvTpX/0
...
22:54:07.285483       7 vtgate.go:607] Execute: initTablet failed because existing tablet keyspace and shard foobar_test_efDbWJRM/0 differ from the provided ones foobar_test_zCKdvTpX/0, request: map[BindVariables:map[] Session:autocommit:true options:{included_fields:ALL workload:OLTP} row_count:-1 DDLStrategy:"direct" SessionUUID:"65c55f97-f373-11ed-97bd-0242ac160003" enable_system_settings:true Sql:CREATE DATABASE `foobar_test_zCKdvTpX`]

Can see that CREATE DATABASE foobar_test_zCKdvTpX failed above in the step in globalCreateDb which creates a tablet in the topology server because it re-used the UID of 28.

I've tested (and will subsequently propose via PR) a solution which simply modifies vtcombo's DBDDL plugin implementation to use a mutex which locks before executing globalCreateDb and releases the lock after that completes which effectively guarantees that access and incrementing of the uid variable is atomic in turn preventing the issue. This seems to have fixed the issue for me.

Reproduction Steps

  1. Start vtcombo up with --dbddl_plugin vttest (any other config should have no effect I believe)
    • I did so via the vitess/vttestserver Docker image and the following Docker Compose config:

      version: '3'
      services:
        vtcombo:
          depends_on:
            mysql:
              condition: service_healthy
          image: vitess/vttestserver:mysql80
          platform: linux/amd64
          command:
            - sh
            - -c
            - |
              vtcombo \
                --logtostderr=true \
                --mysql_server_bind_address 0.0.0.0 \
                --mysql_server_port 3306 \
                --mysql_auth_server_impl none \
                --db_host localhost \
                --db_socket /var/run/mysqld/mysqld.sock \
                --db_app_user root \
                --db_allprivs_user root \
                --db_appdebug_user root \
                --db_dba_user root \
                --db_repl_user root \
                --tablet_refresh_interval 100ms \
                --proto_topo 'keyspaces:<name:"test_keyspace" shards:<name:"0" > > ' \ # vtcombo failed to start up with at least one keyspace, which is maybe a separate "bug"?
                --dbddl_plugin vttest
          ports:
            - "3306:3306"
          volumes:
            - varrunmysqld:/var/run/mysqld
        mysql:
          image: mysql:8.0
          platform: linux/amd64
          command:
            - --default-authentication-plugin=mysql_native_password
            - --max_connections=2000 # vitess makes a lot of connections during tests
          environment:
          - MYSQL_ALLOW_EMPTY_PASSWORD=yes
          healthcheck:
            test: "mysql -e 'select 1'"
            interval: 30s
            timeout: 1s
            start_period: 30s
          volumes:
            - varrunmysqld:/var/run/mysqld
      volumes:
        varrunmysqld:
  2. A bit inelegant but: attempt to execute in parallel multiple CREATE DATABASE <distinct_database_name_each_time> queries
    • One way I was able to reliably reproduce that was by running:
      for i in $(seq 1 10); do mysql -u root -h 127.0.0.1 -P 3306 -e "create database foobar_${i}" & done
      
  3. Observe that at least one or multiple of those queries fail, citing the error mentioned earlier where initTablet fails
    • Error from client's perspective is of the form:

      Error 1105: initTablet failed because existing tablet keyspace and shard foobar_1/0 differ from the provided ones foobar_2/0

Binary Version

$ docker run -it -u root vitess/vttestserver:mysql80 bash
root@f05f921b3f0a:/# vtcombo --version
ERROR: logging before flag.Parse: E0518 19:20:36.152999       7 syslogger.go:149] can't connect to syslog
Version: 17.0.0-SNAPSHOT (Git revision 5d991ac827a37715d0fa8944d0c9817d63e40efa branch 'main') built on Thu May 11 22:55:23 UTC 2023 by vitess@buildkitsandbox using go1.20.3 linux/amd64
root@f05f921b3f0a:/# exit
$ docker inspect vitess/vttestserver:mysql80 | jq .[].RepoDigests
[
  "vitess/vttestserver@sha256:edb0549aacd321a89083f2d57101c1f37765617b4ba6d8f2f53e7f3af07f9a56"
]

Operating System and Environment details

$ docker run -it -u root vitess/vttestserver:mysql80 bash
root@3e9bf17e9eb2:/# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@3e9bf17e9eb2:/# uname -sr
Linux 5.15.49-linuxkit
root@3e9bf17e9eb2:/# uname -m
x86_64

Log Fragments

22:54:07.227162       7 syslogger.go:129] foobar_test_efDbWJRM [keyspace] created value:
22:54:07.227203       7 locks.go:245] Locking keyspace foobar_test_efDbWJRM for action CreateShard
22:54:07.227250       7 syslogger.go:129] foobar_test_efDbWJRM/0 [shard] created value: is_primary_serving:true
22:54:07.227255       7 locks.go:284] Unlocking keyspace foobar_test_efDbWJRM for successful action CreateShard
22:54:07.234657       7 tablet_map.go:90] Creating PRIMARY tablet test-0000000028 for foobar_test_efDbWJRM/0
...
22:54:07.257710       7 syslogger.go:129] foobar_test_zCKdvTpX [keyspace] created value:
22:54:07.258008       7 locks.go:245] Locking keyspace foobar_test_zCKdvTpX for action CreateShard
22:54:07.258269       7 syslogger.go:129] foobar_test_zCKdvTpX/0 [shard] created value: is_primary_serving:true
22:54:07.258384       7 locks.go:284] Unlocking keyspace foobar_test_zCKdvTpX for successful action CreateShard
...
22:54:07.278880       7 tablet_map.go:90] Creating PRIMARY tablet test-0000000028 for foobar_test_zCKdvTpX/0
...
22:54:07.285483       7 vtgate.go:607] Execute: initTablet failed because existing tablet keyspace and shard foobar_test_efDbWJRM/0 differ from the provided ones foobar_test_zCKdvTpX/0, request: map[BindVariables:map[] Session:autocommit:true options:{included_fields:ALL workload:OLTP} row_count:-1 DDLStrategy:"direct" SessionUUID:"65c55f97-f373-11ed-97bd-0242ac160003" enable_system_settings:true Sql:CREATE DATABASE `foobar_test_zCKdvTpX`]
@jspawar jspawar added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels May 18, 2023
@jspawar
Copy link
Contributor Author

jspawar commented May 18, 2023

[Draft] PR to address this: #13117

@rohit-nayak-ps rohit-nayak-ps added Component: VTCombo and removed Needs Triage This issue needs to be correctly labelled and triaged labels May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants