Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: fix corruption check when server has just been compacted #14457

Merged
merged 1 commit into from
Oct 13, 2022

Conversation

jbml
Copy link
Contributor

@jbml jbml commented Sep 13, 2022

When a key-value store corruption check happens immediately after a compaction, the revision at which the key-value store hash is computed, is the compacted revision itself.
In that case, the hash computation logic was incorrect because it returned an ErrCompacted error; this error should instead be returned when the revision at which the key-value store is hashed, is strictly lower than the compacted revision.

Fixes #14325

Signed-off-by: Jeremy Leach 44558776+jbml@users.noreply.github.com

@@ -542,23 +542,19 @@ type hashKVResult struct {
func TestHashKVWhenCompacting(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the refactoring of this test case, is that the test used to run compactions/HashByRev for 1000 rounds, using hash 9899 to 9999, with no guarantee that all compactions have finished before the end of the test.

Now, the test runs for as long as it takes to complete all compactions (within a limit of 10 seconds), using revisions from 9900 to 10000. This is beneficial because the problem from the initial issue, only appears after compacting revision 10000.

@serathius
Copy link
Member

My issues with the proposed change:

  • It makes HashKV revision semantics inconsistent with rest of API. Compaction removes etcd history up to and including specified revision. This operation is inclusive, meaning it also removes the changes from the revision. This is why you cannot request neither HashKV nor Range or revision that was compacted. This change would make it possible to get Hash for revision that is no longer accessible rest of API.
  • Impact is negligible as provided scenario is not realistic. I don't expect that in production clusters someone will compact up to the latest revision and than don't do any updates for long enough for corruption check to run. Periodic corruption check is pretty heavy operation so I also don't expect anyone to set it for 1 minute period. For example proposed period was 12h https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-4.0.md#etcd-server
  • Periodic check is known to be not reliable. We have already implemented a replacement. etcd detects data corruption by default #14039. I think we can consider deprecating the old check when we have enough production data.

@jbml
Copy link
Contributor Author

jbml commented Sep 18, 2022

Thank you for reviewing the PR @serathius

  • It makes HashKV revision semantics inconsistent with rest of API. Compaction removes etcd history up to and including specified revision. This operation is inclusive, meaning it also removes the changes from the revision. This is why you cannot request neither HashKV nor Range or revision that was compacted. This change would make it possible to get Hash for revision that is no longer accessible rest of API.

The intention here was actually to make the HashByRev API consistent with Range, which I believe is what the author of #14325 meant in #14325 (comment) when he/she said:

It's a surprise to me that after compaction, the latest revision is not available to be queried at. Does that seem like the expected behavior? Based on testing with just etcdctl that doesn't seem to be the case.

I believe the following test illustrates this point. Start with a fresh etcd cluster:

> ./bin/etcdctl put foo bar1
> ./bin/etcdctl put foo bar2
> ./bin/etcdctl put foo bar3
> ./bin/etcdctl put foo bar4
> ./bin/etcdctl put foo bar5

>. /tools/etcd-dump-db/etcd-dump-db iterate-bucket  ~/projects/etcd/data/uniquenode.etcd key --decode
rev={main:6 sub:0}, value=[key "foo" | val "bar5" | created 2 | mod 6 | ver 5]
rev={main:5 sub:0}, value=[key "foo" | val "bar4" | created 2 | mod 5 | ver 4]
rev={main:4 sub:0}, value=[key "foo" | val "bar3" | created 2 | mod 4 | ver 3]
rev={main:3 sub:0}, value=[key "foo" | val "bar2" | created 2 | mod 3 | ver 2]
rev={main:2 sub:0}, value=[key "foo" | val "bar1" | created 2 | mod 2 | ver 1]

Then compact it at revision 5:

> ./bin/etcdctl compact 5
compacted revision 5

./tools/etcd-dump-db/etcd-dump-db iterate-bucket  ~/projects/etcd/data/uniquenode.etcd key --decode
rev={main:6 sub:0}, value=[key "foo" | val "bar5" | created 2 | mod 6 | ver 5]
rev={main:5 sub:0}, value=[key "foo" | val "bar4" | created 2 | mod 5 | ver 4]

Then check what revisions are accessible with Range:

> ./bin/etcdctl --rev=4 get foo
{"level":"warn","ts":"2022-09-18T16:19:51.102+1000","logger":"etcd-client","caller":"v3/retry_interceptor.go:64",
"msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00000a780/127.0.0.1:2379",
"method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = OutOfRange 
desc = etcdserver: mvcc: required revision has been compacted"}
Error: etcdserver: mvcc: required revision has been compacted
> ./bin/etcdctl --rev=5 get foo
foo
bar4
> ./bin/etcdctl --rev=6 get foo
foo
bar5

As the version at which the compaction is done (5) is still available, it made sense to me that HashByRev raising an "required revision has been compacted" error when called on the revision, was indeed an issue.

This inconsistency is what we can see when we compare this for the HashByRev API:

if rev > 0 && rev <= compactRev {
s.mu.RUnlock()
return KeyValueHash{}, 0, ErrCompacted

and this for the Range API:
if rev < tr.s.compactMainRev {
return &RangeResult{KVs: nil, Count: -1, Rev: 0}, ErrCompacted
}

with the comparison being "<=" on one side, and "<" on the other side.

  • Impact is negligible as provided scenario is not realistic. I don't expect that in production clusters someone will compact up to the latest revision and than don't do any updates for long enough for corruption check to run. Periodic corruption check is pretty heavy operation so I also don't expect anyone to set it for 1 minute period. For example proposed period was 12h https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-4.0.md#etcd-server

Fully agree that this is a rare case.

Fair enough.

@jbml
Copy link
Contributor Author

jbml commented Sep 22, 2022

@serathius : would you suggest we simply close the PR and associated issue, please ?

@serathius
Copy link
Member

The intention here was actually to make the HashByRev API consistent with Range

Sorry for mistaking it there. I think there is some value in making API consistent, however from top of my head I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV. @ahrtr what's your opinion?

@ahrtr
Copy link
Member

ahrtr commented Sep 22, 2022

The intention here was actually to make the HashByRev API consistent with Range

Sorry for mistaking it there. I think there is some value in making API consistent, however from top of my head I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV. @ahrtr what's your opinion?

I will have a closer look at this PR and related issue late today or tomorrow morning my time.

@ahrtr
Copy link
Member

ahrtr commented Sep 23, 2022

I think this is a valid PR.

When the leader performs the periodic check , it calculates the hash up to the currentRev. The reason why it can pass the check below is that the rev is 0.

if rev > 0 && rev <= compactRev {

But when leader uses the same currentRev to get peers' hashes, it fails to pass the same check above, because the rev isn't 0 anymore. It doesn't make sense. It's exactly what this PR tries to fix.

@ahrtr
Copy link
Member

ahrtr commented Sep 23, 2022

Please note that when I review this PR, I found a related issue, please see #14510

When a key-value store corruption check happens immediately after a
compaction, the revision at which the key-value store hash is computed,
is the compacted revision itself.
In that case, the hash computation logic was incorrect because it
returned an ErrCompacted error; this error should instead be returned when
the revision at which the key-value store is hashed, is strictly lower
than the compacted revision.

Fixes etcd-io#14325

Signed-off-by: Jeremy Leach <44558776+jbml@users.noreply.github.com>
@ahrtr
Copy link
Member

ahrtr commented Sep 24, 2022

Overall looks good to me, and I don't see any negative impact.

I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you @jbml

Leave @serathius to approve and merge this PR.

@jbml
Copy link
Contributor Author

jbml commented Sep 27, 2022

Hello @serathius ,
Do you mind reviewing the PR when time permits please ?

@ahrtr
Copy link
Member

ahrtr commented Oct 13, 2022

cc @spzala @ptabor @serathius @mitake

@ahrtr ahrtr merged commit d15a9d0 into etcd-io:main Oct 13, 2022
@ahrtr
Copy link
Member

ahrtr commented Oct 13, 2022

@jbml Could you backport this PR to 3.5 and 3.4 if needed?

@kkkkun
Copy link
Contributor

kkkkun commented Jun 11, 2023

I found this still do not backport to v3.4 and v3.5. I would help to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Periodic corruption check doesn’t do anything after compaction
4 participants