Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPS: bump across etcd-io/raft#81 and disable conf change validation #106515

Merged
merged 2 commits into from
Jul 17, 2023

Conversation

tbg
Copy link
Member

@tbg tbg commented Jul 10, 2023

We don't want raft to validate conf changes, since that causes issues due to false positives (the check is above raft, but needs to be below raft to always work correctly). We are
taking responsibility for carrying out only valid conf changes, as we always
have.

See also etcd-io/raft#80.

Fixes #105797.
Epic: CRDB-25287
Release note (bug fix): under rare circumstances, a replication change could get
stuck when proposed near lease/leadership changes (and likely under overload),
and the replica circuit breakers could trip. This problem has been addressed.
Note to editors: this time it's really addressed (fingers crossed); a previous
attempt with an identical release note had to be reverted.

@blathers-crl
Copy link

blathers-crl bot commented Jul 10, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg force-pushed the skip-conf-change-validation branch 2 times, most recently from b4ad7fc to 2039d03 Compare July 17, 2023 15:50
@tbg tbg changed the title deps: pick up etcd-io/raft#81 (conf change validation) DEPS: bump across etcd-io/raft#81 and disable conf change validation Jul 17, 2023
@tbg tbg force-pushed the skip-conf-change-validation branch 2 times, most recently from b8f81c8 to 4ecacde Compare July 17, 2023 15:55
@tbg tbg requested a review from erikgrinaker July 17, 2023 15:56
tbg added 2 commits July 17, 2023 18:50
Picks up

- etcd-io/raft#77
- etcd-io/raft#79 (essentially an off-by-default for 77)
- etcd-io/raft#82
- etcd-io/raft#81 (needed for cockroachdb#105797)

Epic: none
Release note: None
Relies on etcd-io/raft#81.

We don't want it to, since that causes issues due to false positives. We are
taking responsibility for carrying out only valid conf changes, as we always
have.

See also etcd-io/raft#80.

Fixes cockroachdb#105797.
Epic: CRDB-25287
Release note (bug fix): under rare circumstances, a replication change could get
stuck when proposed near lease/leadership changes (and likely under overload),
and the replica circuit breakers could trip. This problem has been addressed.
Note to editors: this time it's really addressed (fingers crossed); a previous
attempt with an identical release note had to be reverted.
@tbg tbg force-pushed the skip-conf-change-validation branch from 4ecacde to 4585dde Compare July 17, 2023 16:50
@tbg tbg marked this pull request as ready for review July 17, 2023 16:50
@tbg tbg requested a review from a team July 17, 2023 16:50
@tbg tbg requested review from a team as code owners July 17, 2023 16:50
erikgrinaker

This comment was marked as off-topic.

@tbg
Copy link
Member Author

tbg commented Jul 17, 2023

bors r=erikgrinaker
🚀

@tbg tbg added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-22.2.x labels Jul 17, 2023
@craig
Copy link
Contributor

craig bot commented Jul 17, 2023

Build succeeded:

@craig craig bot merged commit 212d1e9 into cockroachdb:master Jul 17, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 17, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 404188d to blathers/backport-release-22.2-106515: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


error creating merge commit from 404188d to blathers/backport-release-23.1-106515: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@nvanbenschoten
Copy link
Member

@tbg @pav-kv now that we are using DisableConfChangeValidation and we ignore the r.pendingConfIndex > r.raftLog.applied condition here, do we still have a guarantee that a raft leader will never propose a configuration change before it has applied the latest configuration change? In other words, does this change break the Apply and Append invariants from etcd-io/etcd#7625 (comment) for split leaseholder/leader states where the leader is behind on log application? Could that lead to split brain scenarios? cc. @sumeerbhola

@pav-kv
Copy link
Collaborator

pav-kv commented May 20, 2024

I had a similar concern when looking at this. I wonder if it would be sufficient to disable only the 2 checks here (which describe the state-machine-level rules for config transitions, which will be also checked by our state machine), but leave the one that @nvanbenschoten linked which is important for raft's config change safety.

Seemingly, re-enabling this one check won't lead to the same perpetual reproposal behaviour for the config change, because the "leader has unapplied config changes" (r.pendingConfIndex > r.raftLog.applied) condition will eventually become false.

@nvanbenschoten
Copy link
Member

I wonder if it would be sufficient to disable only the 2 checks here, but leave the one that @nvanbenschoten linked which is important for raft's config change etcd-io/etcd#7625 (comment).

That seems like a reasonable change to me. Do you want to take that @pav-kv?

craig bot pushed a commit that referenced this pull request May 29, 2024
124804: raft: re-enable config change safety r=nvanbenschoten a=pav-kv

Config changes in this `raft` implementation require a safety constraint: the leader must not append a config change if it hasn't applied all config changes in its log.

The `DisableConfChangeValidation` flag disables this check under the assumption that the state machine layer provides the equivalent guarantee. However, it is hard to argue that this is true in split leaseholder/leader scenarios.

This commit re-enables this check, to bring the safety back. The other two state-machine-level checks concerned with entering and leaving joint configs can still be disabled.

Related to #105797, etcd-io/raft#81, #106515 (#106515 (comment))

Epic: none
Release note: none

124806: release: push PRs to main repo r=celiala a=rail

Previously, we pushed PRs using the forked repos. For auto-merge to work, we need to push these PRs back to the main repo, so `GITHUB_TOKEN` has enough privileges to merge the PRs.

This PR adds an ability to push the PR branches to the origin repo. Note: the push user has to be given `write` permissions to the repos.

Epic: none
Release note: None

Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Co-authored-by: Rail Aliiev <rail@iqchoice.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 db-cy-23
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: config change proposals can hang forever
5 participants