Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panics when "restore compact", post "recovered store from snapshot" #13456

Closed
NorseGaud opened this issue Nov 2, 2021 · 6 comments
Closed

Panics when "restore compact", post "recovered store from snapshot" #13456

NorseGaud opened this issue Nov 2, 2021 · 6 comments
Labels

Comments

@NorseGaud
Copy link

NorseGaud commented Nov 2, 2021

Previous ticket: #7011

ETCD version: 3.4.16

We see two different types of failures when it goes to "restore compact". This cripples etcd and is a serious problem for many of our users.

2021-05-28 07:13:16.408201 I | embed: listening for peers on http://0.0.0.0:2380
2021-05-28 07:13:16.408244 I | embed: listening for client requests on 127.0.0.1:2379
2021-05-28 07:13:16.409431 I | etcdserver: recovered store from snapshot at index 281303901
2021-05-28 07:13:16.418539 I | mvcc: restore compact to 188392991
2021-05-28 07:13:16.521163 C | mvcc: cannot unmarshal event: proto: KeyValue: illegal tag 0 (wire type 0)
2021-11-01 21:24:01.875492 I | embed: listening for peers on http://0.0.0.0:2380
2021-11-01 21:24:01.875586 I | embed: listening for client requests on 127.0.0.1:2379
2021-11-01 21:24:01.876829 I | etcdserver: recovered store from snapshot at index 49200743
2021-11-01 21:24:01.884359 I | mvcc: restore compact to 32276741
2021-11-01 21:24:01.989286 C | mvcc: store.keyindex: put with unexpected smaller revision [{32296847 1} / {32296849 0}]
panic: store.keyindex: put with unexpected smaller revision [{32296847 1} / {32296849 0}]

goroutine 109 [running]:
github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc0015100c0, 0x2224ddb, 0x3e, 0xc001855cb0, 0x2, 0x2)
	/Users/anka/workspace/cloud-build_release_v1.19.0/src/github.com/coreos/pkg/capnslog/pkg_logger.go:83 +0x135
github.com/coreos/etcd/mvcc.(*keyIndex).put(0xc004be3440, 0x1eccf8f, 0x1)
	/Users/anka/workspace/cloud-build_release_v1.19.0/src/github.com/coreos/etcd/mvcc/key_index.go:80 +0x38e
github.com/coreos/etcd/mvcc.restoreIntoIndex.func1(0xc001524380, 0xc00190a240, 0x38df6e0, 0xc001511f20)
	/Users/anka/workspace/cloud-build_release_v1.19.0/src/github.com/coreos/etcd/mvcc/kvstore.go:435 +0x362
created by github.com/coreos/etcd/mvcc.restoreIntoIndex
	/Users/anka/workspace/cloud-build_release_v1.19.0/src/github.com/coreos/etcd/mvcc/kvstore.go:403 +0xa0

I noticed that boltdb/bolt#632 was opened, but the repo was archived and seems to have been moved to https://github.com/etcd-io/bbolt/issues, yet not the issue...

Issues are a great way to keep track of others and this wouldn't have been forgotten about if it hadn't been originally closed. I'm opening this back up to keep track of etcd-io/bbolt#299

@ahrtr
Copy link
Member

ahrtr commented Nov 5, 2021

Please provide the complete etcd log.

@ahrtr
Copy link
Member

ahrtr commented Nov 7, 2021

Would you mind to share the db file if possible?

Can you see the duplicated key when running the program on your db file?

@NorseGaud
Copy link
Author

NorseGaud commented Nov 7, 2021

Our customers reset their etcd when this happens as they are production environments. Unfortunately, we don't have a db file to share, but the original poster in #7011 provided a file.

I'll see what we can do, but extremely rare that it happens.

@ahrtr
Copy link
Member

ahrtr commented Nov 7, 2021

I indeed see the duplicated key issue using the db file provided in #7011, but the db file is from about 5 years ago, so it's too old.

The reason of this issue should be that the db file is corrupted somehow, so it might be an issue of bolt instead of etcd. I fixed a similar issue previously, see 13406, but it's more serious than this one, because etcd can't even get started.

@ahrtr
Copy link
Member

ahrtr commented Nov 7, 2021

Could you clarify what does "reset" mean?

Our customers reset their etcd

@stale
Copy link

stale bot commented Feb 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 6, 2022
@stale stale bot closed this as completed Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants