Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-17.0] Upgrade-Downgrade Fix: Schema-initialization stuck on semi-sync ACKs while upgrading #13411

Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 1 addition & 11 deletions .github/workflows/upgrade_downgrade_test_backups_manual.yml
Original file line number Diff line number Diff line change
@@ -273,14 +273,6 @@ jobs:
source build.env ; cd examples/backups
./take_backups.sh

# Stopping the tablets so we can perform the upgrade.
- name: Stop tablets
if: steps.skip-workflow.outputs.skip-workflow == 'false' && steps.changes.outputs.end_to_end == 'true'
timeout-minutes: 10
run: |
source build.env ; cd examples/backups
./stop_tablets.sh

# We upgrade: we swap binaries and use the version N of the tablet.
- name: Upgrade - Swap binaries, use VTTablet N
if: steps.skip-workflow.outputs.skip-workflow == 'false' && steps.changes.outputs.end_to_end == 'true'
@@ -299,9 +291,7 @@ jobs:
timeout-minutes: 10
run: |
source build.env ; cd examples/backups
./restart_tablets.sh
# give enough time to the tablets to restore the backup
sleep 90
./upgrade_cluster.sh

# We count the number of rows in every table to check that the restore step was successful.
- name: Assert the number of rows in every table
6 changes: 3 additions & 3 deletions examples/backups/restart_tablets.sh
Original file line number Diff line number Diff line change
@@ -58,6 +58,6 @@ for i in 101 201 301; do
exit 1
done

vtctldclient InitShardPrimary --force commerce/0 zone1-100
vtctldclient InitShardPrimary --force customer/-80 zone1-200
vtctldclient InitShardPrimary --force customer/80- zone1-300
vtctldclient PlannedReparentShard commerce/0 --new-primary "zone1-100"
vtctldclient PlannedReparentShard customer/-80 --new-primary "zone1-200"
vtctldclient PlannedReparentShard customer/80- --new-primary "zone1-300"
10 changes: 7 additions & 3 deletions examples/backups/start_cluster.sh
Original file line number Diff line number Diff line change
@@ -31,6 +31,8 @@ fi
# start vtctld
CELL=zone1 ../common/scripts/vtctld-up.sh

# Create keyspace and set the semi_sync durability policy.
vtctldclient CreateKeyspace --durability-policy=semi_sync commerce || fail "Failed to create and configure the commerce keyspace"

# start vttablets for keyspace commerce
for i in 100 101 102; do
@@ -39,12 +41,14 @@ for i in 100 101 102; do
done

# set one of the replicas to primary
vtctldclient InitShardPrimary --force commerce/0 zone1-100
vtctldclient PlannedReparentShard commerce/0 --new-primary "zone1-100"

# create the schema for commerce
vtctlclient ApplySchema -- --sql-file ./create_commerce_schema.sql commerce || fail "Could not apply schema for the commerce keyspace"
vtctlclient ApplyVSchema -- --vschema_file ../local/vschema_commerce_seq.json commerce || fail "Could not apply vschema for the commerce keyspace"

# Create keyspace and set the semi_sync durability policy.
vtctldclient CreateKeyspace --durability-policy=semi_sync customer || fail "Failed to create and configure the customer keyspace"

# start vttablets for keyspace customer
for i in 200 201 202; do
@@ -57,8 +61,8 @@ for i in 300 301 302; do
done

# set one of the replicas to primary
vtctldclient InitShardPrimary --force customer/-80 zone1-200
vtctldclient InitShardPrimary --force customer/80- zone1-300
vtctldclient PlannedReparentShard customer/-80 --new-primary "zone1-200"
vtctldclient PlannedReparentShard customer/80- --new-primary "zone1-300"

for shard in "-80" "80-"; do
wait_for_healthy_shard customer "${shard}" || exit 1
97 changes: 97 additions & 0 deletions examples/backups/upgrade_cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#!/bin/bash

# Copyright 2023 The Vitess Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# this script brings up new tablets for the two new shards that we will
# be creating in the customer keyspace and copies the schema

source ../common/env.sh

# Restart the replica tablets so that they come up with new vttablet versions
for i in 101 102; do
echo "Shutting down tablet zone1-$i"
CELL=zone1 TABLET_UID=$i ../common/scripts/vttablet-down.sh
echo "Shutting down mysql zone1-$i"
CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-down.sh
echo "Removing tablet directory zone1-$i"
vtctlclient DeleteTablet -- --allow_primary=true zone1-$i
rm -Rf $VTDATAROOT/vt_0000000$i
echo "Starting tablet zone1-$i again"
CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
CELL=zone1 KEYSPACE=commerce TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

for i in 201 202; do
echo "Shutting down tablet zone1-$i"
CELL=zone1 TABLET_UID=$i ../common/scripts/vttablet-down.sh
echo "Shutting down mysql zone1-$i"
CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-down.sh
echo "Removing tablet directory zone1-$i"
vtctlclient DeleteTablet -- --allow_primary=true zone1-$i
rm -Rf $VTDATAROOT/vt_0000000$i
echo "Starting tablet zone1-$i again"
CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
SHARD=-80 CELL=zone1 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

for i in 301 302; do
echo "Shutting down tablet zone1-$i"
CELL=zone1 TABLET_UID=$i ../common/scripts/vttablet-down.sh
echo "Shutting down mysql zone1-$i"
CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-down.sh
echo "Removing tablet directory zone1-$i"
vtctlclient DeleteTablet -- --allow_primary=true zone1-$i
rm -Rf $VTDATAROOT/vt_0000000$i
echo "Starting tablet zone1-$i again"
CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
SHARD=80- CELL=zone1 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

# Wait for all the replica tablets to be in the serving state before reparenting to them.
totalTime=600
for i in 101 201 301; do
while [ $totalTime -gt 0 ]; do
status=$(curl "http://$hostname:15$i/debug/status_details")
echo "$status" | grep "REPLICA: Serving" && break
totalTime=$((totalTime-1))
sleep 0.1
done
done

# Check that all the replica tablets have reached REPLICA: Serving state
for i in 101 201 301; do
status=$(curl "http://$hostname:15$i/debug/status_details")
echo "$status" | grep "REPLICA: Serving" && continue
echo "tablet-$i did not reach REPLICA: Serving state. Exiting due to failure."
exit 1
done

# Promote the replica tablets to primary
vtctldclient PlannedReparentShard commerce/0 --new-primary "zone1-101"
vtctldclient PlannedReparentShard customer/-80 --new-primary "zone1-201"
vtctldclient PlannedReparentShard customer/80- --new-primary "zone1-301"

# Restart the old primary tablets so that they are on the latest version of vttablet too.
echo "Restarting tablet zone1-100"
CELL=zone1 TABLET_UID=100 ../common/scripts/vttablet-down.sh
CELL=zone1 KEYSPACE=commerce TABLET_UID=100 ../common/scripts/vttablet-up.sh

echo "Restarting tablet zone1-200"
CELL=zone1 TABLET_UID=200 ../common/scripts/vttablet-down.sh
SHARD=-80 CELL=zone1 KEYSPACE=customer TABLET_UID=200 ../common/scripts/vttablet-up.sh

echo "Restarting tablet zone1-300"
CELL=zone1 TABLET_UID=300 ../common/scripts/vttablet-down.sh
SHARD=80- CELL=zone1 KEYSPACE=customer TABLET_UID=300 ../common/scripts/vttablet-up.sh
1 change: 0 additions & 1 deletion examples/common/scripts/vttablet-up.sh
Original file line number Diff line number Diff line change
@@ -54,7 +54,6 @@ vttablet \
--service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \
--pid_file $VTDATAROOT/$tablet_dir/vttablet.pid \
--vtctld_addr http://$hostname:$vtctld_web_port/ \
--disable_active_reparents \
--throttler-config-via-topo --heartbeat_enable --heartbeat_interval=250ms --heartbeat_on_demand_duration=5s \
> $VTDATAROOT/$tablet_dir/vttablet.out 2>&1 &

82 changes: 40 additions & 42 deletions go/vt/vtctl/reparentutil/planned_reparenter.go
Original file line number Diff line number Diff line change
@@ -213,7 +213,7 @@ func (pr *PlannedReparenter) performGracefulPromotion(
primaryElect *topodatapb.Tablet,
tabletMap map[string]*topo.TabletInfo,
opts PlannedReparentOptions,
) (string, error) {
) error {
primaryElectAliasStr := topoproto.TabletAliasString(primaryElect.Alias)
ev.OldPrimary = proto.Clone(currentPrimary.Tablet).(*topodatapb.Tablet)

@@ -231,7 +231,7 @@ func (pr *PlannedReparenter) performGracefulPromotion(

snapshotPos, err := pr.tmc.PrimaryPosition(snapshotCtx, currentPrimary.Tablet)
if err != nil {
return "", vterrors.Wrapf(err, "cannot get replication position on current primary %v; current primary must be healthy to perform PlannedReparent", currentPrimary.AliasString())
return vterrors.Wrapf(err, "cannot get replication position on current primary %v; current primary must be healthy to perform PlannedReparent", currentPrimary.AliasString())
}

// Next, we wait for the primary-elect to catch up to that snapshot point.
@@ -246,12 +246,12 @@ func (pr *PlannedReparenter) performGracefulPromotion(
defer setSourceCancel()

if err := pr.tmc.SetReplicationSource(setSourceCtx, primaryElect, currentPrimary.Alias, 0, snapshotPos, true, IsReplicaSemiSync(opts.durability, currentPrimary.Tablet, primaryElect)); err != nil {
return "", vterrors.Wrapf(err, "replication on primary-elect %v did not catch up in time; replication must be healthy to perform PlannedReparent", primaryElectAliasStr)
return vterrors.Wrapf(err, "replication on primary-elect %v did not catch up in time; replication must be healthy to perform PlannedReparent", primaryElectAliasStr)
}

// Verify we still have the topology lock before doing the demotion.
if err := topo.CheckShardLocked(ctx, keyspace, shard); err != nil {
return "", vterrors.Wrap(err, "lost topology lock; aborting")
return vterrors.Wrap(err, "lost topology lock; aborting")
}

// Next up, demote the current primary and get its replication position.
@@ -265,7 +265,7 @@ func (pr *PlannedReparenter) performGracefulPromotion(

primaryStatus, err := pr.tmc.DemotePrimary(demoteCtx, currentPrimary.Tablet)
if err != nil {
return "", vterrors.Wrapf(err, "failed to DemotePrimary on current primary %v: %v", currentPrimary.AliasString(), err)
return vterrors.Wrapf(err, "failed to DemotePrimary on current primary %v: %v", currentPrimary.AliasString(), err)
}

// Wait for the primary-elect to catch up to the position we demoted the
@@ -298,26 +298,10 @@ func (pr *PlannedReparenter) performGracefulPromotion(
finalWaitErr = vterrors.Wrapf(finalWaitErr, "encountered error while performing UndoDemotePrimary(%v): %v", currentPrimary.AliasString(), undoErr)
}

return "", finalWaitErr
}

// Primary-elect is caught up to the current primary. We can do the
// promotion now.
promoteCtx, promoteCancel := context.WithTimeout(ctx, opts.WaitReplicasTimeout)
defer promoteCancel()

rp, err := pr.tmc.PromoteReplica(promoteCtx, primaryElect, SemiSyncAckers(opts.durability, primaryElect) > 0)
if err != nil {
return "", vterrors.Wrapf(err, "primary-elect tablet %v failed to be promoted to primary; please try again", primaryElectAliasStr)
}

if ctx.Err() == context.DeadlineExceeded {
// PromoteReplica succeeded, but we ran out of time. PRS needs to be
// re-run to complete fully.
return "", vterrors.Errorf(vtrpc.Code_DEADLINE_EXCEEDED, "PLannedReparent timed out after successfully promoting primary-elect %v; please re-run to fix up the replicas", primaryElectAliasStr)
return finalWaitErr
}

return rp, nil
return nil
}

func (pr *PlannedReparenter) performInitialPromotion(
@@ -383,7 +367,7 @@ func (pr *PlannedReparenter) performPotentialPromotion(
primaryElect *topodatapb.Tablet,
tabletMap map[string]*topo.TabletInfo,
opts PlannedReparentOptions,
) (string, error) {
) error {
primaryElectAliasStr := topoproto.TabletAliasString(primaryElect.Alias)

pr.logger.Infof("no clear winner found for current primary term; checking if it's safe to recover by electing %v", primaryElectAliasStr)
@@ -457,7 +441,7 @@ func (pr *PlannedReparenter) performPotentialPromotion(
close(positions)

if rec.HasErrors() {
return "", vterrors.Wrap(rec.Error(), "failed to demote all tablets")
return vterrors.Wrap(rec.Error(), "failed to demote all tablets")
}

// Construct a mapping of alias to tablet position.
@@ -478,7 +462,7 @@ func (pr *PlannedReparenter) performPotentialPromotion(
// if the candidate primary is behind that tablet.
tp, ok := tabletPosMap[primaryElectAliasStr]
if !ok {
return "", vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "primary-elect tablet %v not found in tablet map", primaryElectAliasStr)
return vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "primary-elect tablet %v not found in tablet map", primaryElectAliasStr)
}

primaryElectPos := tp.pos
@@ -487,7 +471,7 @@ func (pr *PlannedReparenter) performPotentialPromotion(
// The primary-elect pos has to be at least as advanced as every tablet
// in the shard.
if !primaryElectPos.AtLeast(tp.pos) {
return "", vterrors.Errorf(
return vterrors.Errorf(
vtrpc.Code_FAILED_PRECONDITION,
"tablet %v (position: %v) contains transactions not found in primary-elect %v (position: %v)",
tp.alias, tp.pos, primaryElectAliasStr, primaryElectPos,
@@ -497,19 +481,9 @@ func (pr *PlannedReparenter) performPotentialPromotion(

// Check that we still have the topology lock.
if err := topo.CheckShardLocked(ctx, keyspace, shard); err != nil {
return "", vterrors.Wrap(err, "lost topology lock; aborting")
}

// Promote the candidate primary to type:PRIMARY.
promoteCtx, promoteCancel := context.WithTimeout(ctx, topo.RemoteOperationTimeout)
defer promoteCancel()

rp, err := pr.tmc.PromoteReplica(promoteCtx, primaryElect, SemiSyncAckers(opts.durability, primaryElect) > 0)
if err != nil {
return "", vterrors.Wrapf(err, "failed to promote %v to primary", primaryElectAliasStr)
return vterrors.Wrap(err, "lost topology lock; aborting")
}

return rp, nil
return nil
}

func (pr *PlannedReparenter) reparentShardLocked(
@@ -553,6 +527,11 @@ func (pr *PlannedReparenter) reparentShardLocked(

currentPrimary := FindCurrentPrimary(tabletMap, pr.logger)
reparentJournalPos := ""
// promoteReplicaRequired is a boolean that is used to store whether we need to call
// `PromoteReplica` when we reparent the tablets. This is required to be done when we are doing
// a potential or a graceful promotion.
// InitialPromotion calls `InitPrimary` and for partial promotion, the tablet is already a primary.
promoteReplicaRequired := false
// needsRefresh is used to keep track of whether we need to refresh the state
// of the new primary tablet. The only case that we need to reload the state
// is when we are initializing the new primary. The reason is that the first
@@ -601,15 +580,19 @@ func (pr *PlannedReparenter) reparentShardLocked(
case currentPrimary == nil && ev.ShardInfo.PrimaryAlias != nil:
// Case (2): no clear current primary. Try to find a safe promotion
// candidate, and promote to it.
reparentJournalPos, err = pr.performPotentialPromotion(ctx, keyspace, shard, ev.NewPrimary, tabletMap, opts)
err = pr.performPotentialPromotion(ctx, keyspace, shard, ev.NewPrimary, tabletMap, opts)
// We need to call `PromoteReplica` when we reparent the tablets.
promoteReplicaRequired = true
case topoproto.TabletAliasEqual(currentPrimary.Alias, opts.NewPrimaryAlias):
// Case (3): desired new primary is the current primary. Attempt to fix
// up replicas to recover from a previous partial promotion.
reparentJournalPos, err = pr.performPartialPromotionRecovery(ctx, ev.NewPrimary)
default:
// Case (4): desired primary and current primary differ. Do a graceful
// demotion-then-promotion.
reparentJournalPos, err = pr.performGracefulPromotion(ctx, ev, keyspace, shard, currentPrimary, ev.NewPrimary, tabletMap, opts)
err = pr.performGracefulPromotion(ctx, ev, keyspace, shard, currentPrimary, ev.NewPrimary, tabletMap, opts)
// We need to call `PromoteReplica` when we reparent the tablets.
promoteReplicaRequired = true
}

if err != nil {
@@ -620,7 +603,7 @@ func (pr *PlannedReparenter) reparentShardLocked(
return vterrors.Wrap(err, "lost topology lock, aborting")
}

if err := pr.reparentTablets(ctx, ev, reparentJournalPos, tabletMap, opts); err != nil {
if err := pr.reparentTablets(ctx, ev, reparentJournalPos, promoteReplicaRequired, tabletMap, opts); err != nil {
return err
}

@@ -637,6 +620,7 @@ func (pr *PlannedReparenter) reparentTablets(
ctx context.Context,
ev *events.Reparent,
reparentJournalPosition string,
promoteReplicaRequired bool,
tabletMap map[string]*topo.TabletInfo,
opts PlannedReparentOptions,
) error {
@@ -688,6 +672,20 @@ func (pr *PlannedReparenter) reparentTablets(
}(alias, tabletInfo.Tablet)
}

// If `PromoteReplica` call is requried, we should call it and use the position that it returns.
if promoteReplicaRequired {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be kinda neat if PromoteReplica was a no-op if the tablet was already type:PRIMARY since it would simplify the logic here a bit (however, that would make the actual server-side implementation more involved, and possibly too much, so 🤷 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PromoteReplica is already idempotent. We can call it over and over and it won't be wrong per se, but we still only want to call it when required.

Also, I wanted to reduce the change to the PRS flow as much as possible and only make the small change of running PromoteReplica in parallel instead of sequentially.

// Promote the candidate primary to type:PRIMARY.
primaryPosition, err := pr.tmc.PromoteReplica(replCtx, ev.NewPrimary, SemiSyncAckers(opts.durability, ev.NewPrimary) > 0)
if err != nil {
pr.logger.Warningf("primary %v failed to PromoteReplica; cancelling replica reparent attempts", primaryElectAliasStr)
replCancel()
replicasWg.Wait()

return vterrors.Wrapf(err, "failed PromoteReplica(primary=%v, ts=%v): %v", primaryElectAliasStr, reparentJournalTimestamp, err)
}
reparentJournalPosition = primaryPosition
}

// Add a reparent journal entry on the new primary. If semi-sync is enabled,
// this blocks until at least one replica is reparented (above) and
// successfully replicating from the new primary.
Loading