etcdserver: fix corruption check when server has just been compacted #14457

jbml · 2022-09-13T12:32:23Z

When a key-value store corruption check happens immediately after a compaction, the revision at which the key-value store hash is computed, is the compacted revision itself.
In that case, the hash computation logic was incorrect because it returned an ErrCompacted error; this error should instead be returned when the revision at which the key-value store is hashed, is strictly lower than the compacted revision.

Fixes #14325

Signed-off-by: Jeremy Leach 44558776+jbml@users.noreply.github.com

jbml · 2022-09-13T12:40:46Z

server/storage/mvcc/kvstore_test.go

@@ -542,23 +542,19 @@ type hashKVResult struct {
 func TestHashKVWhenCompacting(t *testing.T) {


The reason for the refactoring of this test case, is that the test used to run compactions/HashByRev for 1000 rounds, using hash 9899 to 9999, with no guarantee that all compactions have finished before the end of the test.

Now, the test runs for as long as it takes to complete all compactions (within a limit of 10 seconds), using revisions from 9900 to 10000. This is beneficial because the problem from the initial issue, only appears after compacting revision 10000.

serathius · 2022-09-13T14:07:35Z

My issues with the proposed change:

It makes HashKV revision semantics inconsistent with rest of API. Compaction removes etcd history up to and including specified revision. This operation is inclusive, meaning it also removes the changes from the revision. This is why you cannot request neither HashKV nor Range or revision that was compacted. This change would make it possible to get Hash for revision that is no longer accessible rest of API.
Impact is negligible as provided scenario is not realistic. I don't expect that in production clusters someone will compact up to the latest revision and than don't do any updates for long enough for corruption check to run. Periodic corruption check is pretty heavy operation so I also don't expect anyone to set it for 1 minute period. For example proposed period was 12h https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-4.0.md#etcd-server
Periodic check is known to be not reliable. We have already implemented a replacement. etcd detects data corruption by default #14039. I think we can consider deprecating the old check when we have enough production data.

jbml · 2022-09-18T07:03:42Z

Thank you for reviewing the PR @serathius

It makes HashKV revision semantics inconsistent with rest of API. Compaction removes etcd history up to and including specified revision. This operation is inclusive, meaning it also removes the changes from the revision. This is why you cannot request neither HashKV nor Range or revision that was compacted. This change would make it possible to get Hash for revision that is no longer accessible rest of API.

The intention here was actually to make the HashByRev API consistent with Range, which I believe is what the author of #14325 meant in #14325 (comment) when he/she said:

It's a surprise to me that after compaction, the latest revision is not available to be queried at. Does that seem like the expected behavior? Based on testing with just etcdctl that doesn't seem to be the case.

I believe the following test illustrates this point. Start with a fresh etcd cluster:

> ./bin/etcdctl put foo bar1
> ./bin/etcdctl put foo bar2
> ./bin/etcdctl put foo bar3
> ./bin/etcdctl put foo bar4
> ./bin/etcdctl put foo bar5

>. /tools/etcd-dump-db/etcd-dump-db iterate-bucket  ~/projects/etcd/data/uniquenode.etcd key --decode
rev={main:6 sub:0}, value=[key "foo" | val "bar5" | created 2 | mod 6 | ver 5]
rev={main:5 sub:0}, value=[key "foo" | val "bar4" | created 2 | mod 5 | ver 4]
rev={main:4 sub:0}, value=[key "foo" | val "bar3" | created 2 | mod 4 | ver 3]
rev={main:3 sub:0}, value=[key "foo" | val "bar2" | created 2 | mod 3 | ver 2]
rev={main:2 sub:0}, value=[key "foo" | val "bar1" | created 2 | mod 2 | ver 1]

Then compact it at revision 5:

> ./bin/etcdctl compact 5
compacted revision 5

./tools/etcd-dump-db/etcd-dump-db iterate-bucket  ~/projects/etcd/data/uniquenode.etcd key --decode
rev={main:6 sub:0}, value=[key "foo" | val "bar5" | created 2 | mod 6 | ver 5]
rev={main:5 sub:0}, value=[key "foo" | val "bar4" | created 2 | mod 5 | ver 4]

Then check what revisions are accessible with Range:

> ./bin/etcdctl --rev=4 get foo
{"level":"warn","ts":"2022-09-18T16:19:51.102+1000","logger":"etcd-client","caller":"v3/retry_interceptor.go:64",
"msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00000a780/127.0.0.1:2379",
"method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = OutOfRange 
desc = etcdserver: mvcc: required revision has been compacted"}
Error: etcdserver: mvcc: required revision has been compacted
> ./bin/etcdctl --rev=5 get foo
foo
bar4
> ./bin/etcdctl --rev=6 get foo
foo
bar5

As the version at which the compaction is done (5) is still available, it made sense to me that HashByRev raising an "required revision has been compacted" error when called on the revision, was indeed an issue.

This inconsistency is what we can see when we compare this for the HashByRev API:

etcd/server/storage/mvcc/kvstore.go

Lines 180 to 182 in e61d431

    
           if rev > 0 && rev <= compactRev { 
        
           	s.mu.RUnlock() 
        
           	return KeyValueHash{}, 0, ErrCompacted

and this for the Range API:

etcd/server/storage/mvcc/kvstore_txn.go

Lines 74 to 76 in fff5d00

    
           if rev < tr.s.compactMainRev { 
        
           	return &RangeResult{KVs: nil, Count: -1, Rev: 0}, ErrCompacted 
        
           }

with the comparison being "<=" on one side, and "<" on the other side.

Impact is negligible as provided scenario is not realistic. I don't expect that in production clusters someone will compact up to the latest revision and than don't do any updates for long enough for corruption check to run. Periodic corruption check is pretty heavy operation so I also don't expect anyone to set it for 1 minute period. For example proposed period was 12h https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-4.0.md#etcd-server

Fully agree that this is a rare case.

Periodic check is known to be not reliable. We have already implemented a replacement. etcd detects data corruption by default #14039. I think we can consider deprecating the old check when we have enough production data.

Fair enough.

jbml · 2022-09-22T05:31:58Z

@serathius : would you suggest we simply close the PR and associated issue, please ?

serathius · 2022-09-22T09:04:32Z

The intention here was actually to make the HashByRev API consistent with Range

Sorry for mistaking it there. I think there is some value in making API consistent, however from top of my head I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV. @ahrtr what's your opinion?

ahrtr · 2022-09-22T09:11:53Z

The intention here was actually to make the HashByRev API consistent with Range

Sorry for mistaking it there. I think there is some value in making API consistent, however from top of my head I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV. @ahrtr what's your opinion?

I will have a closer look at this PR and related issue late today or tomorrow morning my time.

ahrtr · 2022-09-23T05:09:09Z

I think this is a valid PR.

When the leader performs the periodic check , it calculates the hash up to the currentRev. The reason why it can pass the check below is that the rev is 0.

etcd/server/storage/mvcc/kvstore.go

Line 180 in 54f9483

if rev > 0 && rev <= compactRev {

But when leader uses the same currentRev to get peers' hashes, it fails to pass the same check above, because the rev isn't 0 anymore. It doesn't make sense. It's exactly what this PR tries to fix.

ahrtr · 2022-09-23T05:12:05Z

Please note that when I review this PR, I found a related issue, please see #14510

server/storage/mvcc/kvstore_test.go

When a key-value store corruption check happens immediately after a compaction, the revision at which the key-value store hash is computed, is the compacted revision itself. In that case, the hash computation logic was incorrect because it returned an ErrCompacted error; this error should instead be returned when the revision at which the key-value store is hashed, is strictly lower than the compacted revision. Fixes etcd-io#14325 Signed-off-by: Jeremy Leach <44558776+jbml@users.noreply.github.com>

ahrtr · 2022-09-24T22:19:19Z

Overall looks good to me, and I don't see any negative impact.

I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV

ahrtr

LGTM

Thank you @jbml

Leave @serathius to approve and merge this PR.

jbml · 2022-09-27T13:52:44Z

Hello @serathius ,
Do you mind reviewing the PR when time permits please ?

ahrtr · 2022-10-13T02:25:55Z

cc @spzala @ptabor @serathius @mitake

ahrtr · 2022-10-13T07:18:40Z

@jbml Could you backport this PR to 3.5 and 3.4 if needed?

kkkkun · 2023-06-11T13:02:40Z

I found this still do not backport to v3.4 and v3.5. I would help to do this.

jbml commented Sep 13, 2022

View reviewed changes

ahrtr reviewed Sep 23, 2022

View reviewed changes

server/storage/mvcc/kvstore_test.go Show resolved Hide resolved

jbml force-pushed the hashbyrev_compact_main branch from 682fd4a to cc1e245 Compare September 24, 2022 12:21

ahrtr approved these changes Sep 24, 2022

View reviewed changes

serathius approved these changes Oct 13, 2022

View reviewed changes

ahrtr merged commit d15a9d0 into etcd-io:main Oct 13, 2022

ahrtr added backport/v3.4 backport/v3.5 labels Oct 13, 2022

This was referenced Jun 11, 2023

[3.4] etcdserver: fix corruption check when server has just been compacted #16047

Merged

[3.5] etcdserver: fix corruption check when server has just been compacted #16048

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdserver: fix corruption check when server has just been compacted #14457

etcdserver: fix corruption check when server has just been compacted #14457

jbml commented Sep 13, 2022 •

edited

Loading

jbml Sep 13, 2022

serathius commented Sep 13, 2022

jbml commented Sep 18, 2022

jbml commented Sep 22, 2022

serathius commented Sep 22, 2022

ahrtr commented Sep 22, 2022

ahrtr commented Sep 23, 2022

ahrtr commented Sep 23, 2022

ahrtr commented Sep 24, 2022

ahrtr left a comment

jbml commented Sep 27, 2022

ahrtr commented Oct 13, 2022

ahrtr commented Oct 13, 2022

kkkkun commented Jun 11, 2023

		@@ -542,23 +542,19 @@ type hashKVResult struct {
		func TestHashKVWhenCompacting(t *testing.T) {

etcdserver: fix corruption check when server has just been compacted #14457

etcdserver: fix corruption check when server has just been compacted #14457

Conversation

jbml commented Sep 13, 2022 • edited Loading

jbml Sep 13, 2022

Choose a reason for hiding this comment

serathius commented Sep 13, 2022

jbml commented Sep 18, 2022

jbml commented Sep 22, 2022

serathius commented Sep 22, 2022

ahrtr commented Sep 22, 2022

ahrtr commented Sep 23, 2022

ahrtr commented Sep 23, 2022

ahrtr commented Sep 24, 2022

ahrtr left a comment

Choose a reason for hiding this comment

jbml commented Sep 27, 2022

ahrtr commented Oct 13, 2022

ahrtr commented Oct 13, 2022

kkkkun commented Jun 11, 2023

jbml commented Sep 13, 2022 •

edited

Loading