-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid race condition in memory topo watch shutdown #10954
Merged
deepthi
merged 1 commit into
vitessio:main
from
planetscale:dbussink/avoid-race-memory-topo-close
Aug 8, 2022
Merged
Avoid race condition in memory topo watch shutdown #10954
deepthi
merged 1 commit into
vitessio:main
from
planetscale:dbussink/avoid-race-memory-topo-close
Aug 8, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When we call close we want to lock around clearing out the factory object. In the watch goroutine shutdown we want to grab a reference to the factory to ensure that it can keep going with the shutdown procedure. It's still possible to panic here on wrong usage but that's deliberate to avoid reuse of the memory topo. Found with the race detector on a build: ``` ================== WARNING: DATA RACE Write at 0x00c00047c780 by goroutine 19: vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Close() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/memorytopo.go:145 +0x30 vitess.io/vitess/go/vt/topo.(*StatsConn).Close() /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:201 +0x17e vitess.io/vitess/go/vt/topo.(*Server).Close() /home/runner/work/vitess-private/vitess-private/go/vt/topo/server.go:335 +0x25a vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:125 +0x605 vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35 testing.tRunner() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213 testing.(*T).Run.func1() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47 Previous read at 0x00c00047c780 by goroutine 32: vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:58 +0xbc Goroutine 19 (running) created at: testing.(*T).Run() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x724 testing.runTests.func1() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1839 +0x99 testing.tRunner() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213 testing.runTests() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1837 +0x7e4 testing.(*M).Run() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1719 +0xa71 main.main() _testmain.go:47 +0x2e4 Goroutine 32 (finished) created at: vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:53 +0x564 vitess.io/vitess/go/vt/topo.(*StatsConn).Watch() /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:168 +0x201 vitess.io/vitess/go/vt/topo/test.waitForInitialValue() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:42 +0xdb vitess.io/vitess/go/vt/topo/test.checkWatch() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:140 +0x414 vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:116 +0x564 vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35 testing.tRunner() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213 testing.(*T).Run.func1() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47 ================== Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --- FAIL: TestMemoryTopo (0.06s) testing.go:49: === checkKeyspace testing.go:54: === checkShard testing.go:59: === checkTablet testing.go:64: === checkShardReplication testing.go:69: === checkSrvKeyspace testing.go:74: === checkSrvVSchema testing.go:79: === checkLock lock.go:49: === checkLockTimeout lock.go:52: === checkLockMissing lock.go:55: === checkLockUnblocks testing.go:84: === checkVSchema testing.go:89: === checkRoutingRules testing.go:94: === checkElection testing.go:99: === checkWaitForNewLeader testing.go:104: === checkDirectory directory.go:33: === checkDirectoryInCell global directory.go:41: === checkDirectoryInCell test testing.go:109: === checkFile file.go:33: === checkFileInCell global file.go:41: === checkFileInCell global testing.go:114: === checkWatch testing.go:119: === checkList testing.go:122: === checkWatchRecursive testing.go:1312: race detected during execution of test FAIL FAIL vitess.io/vitess/go/vt/topo/memorytopo 0.087s ``` Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
dbussink
changed the title
Avoid race condition in watch shutdown
Avoid race condition in memory topo watch shutdown
Aug 8, 2022
deepthi
approved these changes
Aug 8, 2022
systay
pushed a commit
to planetscale/vitess
that referenced
this pull request
Aug 19, 2022
When we call close we want to lock around clearing out the factory object. In the watch goroutine shutdown we want to grab a reference to the factory to ensure that it can keep going with the shutdown procedure. It's still possible to panic here on wrong usage but that's deliberate to avoid reuse of the memory topo. Found with the race detector on a build: ``` ================== WARNING: DATA RACE Write at 0x00c00047c780 by goroutine 19: vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Close() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/memorytopo.go:145 +0x30 vitess.io/vitess/go/vt/topo.(*StatsConn).Close() /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:201 +0x17e vitess.io/vitess/go/vt/topo.(*Server).Close() /home/runner/work/vitess-private/vitess-private/go/vt/topo/server.go:335 +0x25a vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:125 +0x605 vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35 testing.tRunner() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213 testing.(*T).Run.func1() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47 Previous read at 0x00c00047c780 by goroutine 32: vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:58 +0xbc Goroutine 19 (running) created at: testing.(*T).Run() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x724 testing.runTests.func1() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1839 +0x99 testing.tRunner() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213 testing.runTests() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1837 +0x7e4 testing.(*M).Run() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1719 +0xa71 main.main() _testmain.go:47 +0x2e4 Goroutine 32 (finished) created at: vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:53 +0x564 vitess.io/vitess/go/vt/topo.(*StatsConn).Watch() /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:168 +0x201 vitess.io/vitess/go/vt/topo/test.waitForInitialValue() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:42 +0xdb vitess.io/vitess/go/vt/topo/test.checkWatch() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:140 +0x414 vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite() /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:116 +0x564 vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo() /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35 testing.tRunner() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213 testing.(*T).Run.func1() /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47 ================== Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --- FAIL: TestMemoryTopo (0.06s) testing.go:49: === checkKeyspace testing.go:54: === checkShard testing.go:59: === checkTablet testing.go:64: === checkShardReplication testing.go:69: === checkSrvKeyspace testing.go:74: === checkSrvVSchema testing.go:79: === checkLock lock.go:49: === checkLockTimeout lock.go:52: === checkLockMissing lock.go:55: === checkLockUnblocks testing.go:84: === checkVSchema testing.go:89: === checkRoutingRules testing.go:94: === checkElection testing.go:99: === checkWaitForNewLeader testing.go:104: === checkDirectory directory.go:33: === checkDirectoryInCell global directory.go:41: === checkDirectoryInCell test testing.go:109: === checkFile file.go:33: === checkFileInCell global file.go:41: === checkFileInCell global testing.go:114: === checkWatch testing.go:119: === checkList testing.go:122: === checkWatchRecursive testing.go:1312: race detected during execution of test FAIL FAIL vitess.io/vitess/go/vt/topo/memorytopo 0.087s ``` Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
This was referenced Aug 22, 2022
notfelineit
pushed a commit
to planetscale/vitess
that referenced
this pull request
Sep 21, 2022
* Revert "Add explicit close state to memory topo connection (vitessio#11110) (vitessio#1016)" This reverts commit eb1e9c2. * Revert "Fix races in memory topo and watcher (vitessio#11065) (vitessio#995)" This reverts commit 6bc0171. * Revert "Avoid race condition in watch shutdown (vitessio#10954) (vitessio#936)" This reverts commit 23d4e34. * Revert "Remove potential double close of channel (vitessio#10929) (vitessio#921)" This reverts commit 0121e5d. * Revert "Cherry pick topo improvements (vitessio#10906) (vitessio#916)" This reverts commit 8c9f56d.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we call close we want to lock around clearing out the factory object. In the watch goroutine shutdown we want to grab a reference to the factory to ensure that it can keep going with the shutdown procedure.
It's still possible to panic here on wrong usage but that's deliberate to avoid reuse of the memory topo.
Found with the race detector on a build:
Related Issue(s)
Follow up on #10906 although this wasn't introduced there, but this problem already existed.
Checklist