Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: shutdown during snapshot can cause crash #75687

Closed
ajwerner opened this issue Jan 29, 2022 · 1 comment · Fixed by #75735
Closed

kvserver: shutdown during snapshot can cause crash #75687

ajwerner opened this issue Jan 29, 2022 · 1 comment · Fixed by #75735
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).

Comments

@ajwerner
Copy link
Contributor

Describe the problem

Observed this in this test failure. Seems like we've got a problem involving shutdown and snapshots.

From logTestBenchmarkExpectation2584691787/rttanalysisccltest.log

I220128 23:22:43.964746 453902 kv/kvserver/queue.go:584 ⋮ [n3,replicate,s3,r114/2:‹/Table/94/1/"@"{-/Pref…}›] 33143  rate limited in MaybeAdd (replicate): ‹node unavailable; try another peer›
I220128 23:22:43.964782 453902 kv/kvserver/replicate_queue.go:315 ⋮ [n3,replicate,s3,r114/2:‹/Table/94/1/"@"{-/Pref…}›] 33144  (n1,s1):4NON_VOTER: remote failed to apply snapshot: ‹rpc error: code = Canceled desc = grpc: the client connection is closing›
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145  a panic has occurred!
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +close of closed channel
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +(1) attached stack trace
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  -- stack trace:
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | runtime.gopanic
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/usr/local/go/src/runtime/panic.go:1038
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | runtime.closechan
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/usr/local/go/src/runtime/chan.go:363
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/pebble/sstable.(*writeQueue).finish
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/write_queue.go:108
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/pebble/sstable.(*Writer).Close
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/writer.go:1047
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/storage.(*SSTWriter).Close
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/storage/sst_writer.go:296
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*multiSSTWriter).Close
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_snapshot.go:221
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*kvBatchSnapshotStrategy).Receive
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_snapshot.go:283
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).receiveSnapshot
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_snapshot.go:679
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleSnapshot.func1
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:82
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:344
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleSnapshot
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:72
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftSnapshot.func1.1
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:434
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftSnapshot.func1
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:435
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:494
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | runtime.goexit
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +Wraps: (2) close of closed channel
E220128 23:22:43.967256 454088 1@util/log/logcrash/crash_reporting.go:174 â‹® [n1,s1] 33145 +Error types: (1) *withstack.withStack (2) runtime.plainError
@ajwerner ajwerner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 29, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jan 29, 2022
@ajwerner ajwerner added A-kv Anything in KV that doesn't belong in a more specific category. C-test-failure Broken test (automatically or manually discovered). and removed T-kv KV Team labels Jan 29, 2022
@ajwerner
Copy link
Contributor Author

Early indications are pointing to cockroachdb/pebble#1466, cc @bananabrick

craig bot pushed a commit that referenced this issue Jan 31, 2022
75735: Revert "vendor: bump Pebble to b958d9a7760b" r=nicktrav a=ajwerner

This reverts commit 51d9f70.

That pebble bump exposed flakes like #75717 and #75687. 

Fixes #75687
Fixes #75717 

Co-authored-by: Andrew Werner <awerner32@gmail.com>
@craig craig bot closed this as completed in 84bdba8 Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants