Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition in memory topo watch shutdown #10954

Merged

Conversation

dbussink
Copy link
Contributor

@dbussink dbussink commented Aug 8, 2022

When we call close we want to lock around clearing out the factory object. In the watch goroutine shutdown we want to grab a reference to the factory to ensure that it can keep going with the shutdown procedure.

It's still possible to panic here on wrong usage but that's deliberate to avoid reuse of the memory topo.

Found with the race detector on a build:

==================
WARNING: DATA RACE
Write at 0x00c00047c780 by goroutine 19:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/memorytopo.go:145 +0x30
  vitess.io/vitess/go/vt/topo.(*StatsConn).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:201 +0x17e
  vitess.io/vitess/go/vt/topo.(*Server).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/server.go:335 +0x25a
  vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:125 +0x605
  vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47

Previous read at 0x00c00047c780 by goroutine 32:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:58 +0xbc

Goroutine 19 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x724
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1839 +0x99
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.runTests()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1837 +0x7e4
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1719 +0xa71
  main.main()
      _testmain.go:47 +0x2e4

Goroutine 32 (finished) created at:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:53 +0x564
  vitess.io/vitess/go/vt/topo.(*StatsConn).Watch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:168 +0x201
  vitess.io/vitess/go/vt/topo/test.waitForInitialValue()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:42 +0xdb
  vitess.io/vitess/go/vt/topo/test.checkWatch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:140 +0x414
  vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:116 +0x564
  vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47
==================

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
--- FAIL: TestMemoryTopo (0.06s)
    testing.go:49: === checkKeyspace
    testing.go:54: === checkShard
    testing.go:59: === checkTablet
    testing.go:64: === checkShardReplication
    testing.go:69: === checkSrvKeyspace
    testing.go:74: === checkSrvVSchema
    testing.go:79: === checkLock
    lock.go:49: ===      checkLockTimeout
    lock.go:52: ===      checkLockMissing
    lock.go:55: ===      checkLockUnblocks
    testing.go:84: === checkVSchema
    testing.go:89: === checkRoutingRules
    testing.go:94: === checkElection
    testing.go:99: === checkWaitForNewLeader
    testing.go:104: === checkDirectory
    directory.go:33: ===   checkDirectoryInCell global
    directory.go:41: ===   checkDirectoryInCell test
    testing.go:109: === checkFile
    file.go:33: ===   checkFileInCell global
    file.go:41: ===   checkFileInCell global
    testing.go:114: === checkWatch
    testing.go:119: === checkList
    testing.go:122: === checkWatchRecursive
    testing.go:1312: race detected during execution of test
FAIL
FAIL	vitess.io/vitess/go/vt/topo/memorytopo	0.087s

Related Issue(s)

Follow up on #10906 although this wasn't introduced there, but this problem already existed.

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

When we call close we want to lock around clearing out the factory
object. In the watch goroutine shutdown we want to grab a reference to
the factory to ensure that it can keep going with the shutdown
procedure.

It's still possible to panic here on wrong usage but that's deliberate
to avoid reuse of the memory topo.

Found with the race detector on a build:

```
==================
WARNING: DATA RACE
Write at 0x00c00047c780 by goroutine 19:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/memorytopo.go:145 +0x30
  vitess.io/vitess/go/vt/topo.(*StatsConn).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:201 +0x17e
  vitess.io/vitess/go/vt/topo.(*Server).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/server.go:335 +0x25a
  vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:125 +0x605
  vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47

Previous read at 0x00c00047c780 by goroutine 32:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:58 +0xbc

Goroutine 19 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x724
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1839 +0x99
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.runTests()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1837 +0x7e4
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1719 +0xa71
  main.main()
      _testmain.go:47 +0x2e4

Goroutine 32 (finished) created at:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:53 +0x564
  vitess.io/vitess/go/vt/topo.(*StatsConn).Watch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:168 +0x201
  vitess.io/vitess/go/vt/topo/test.waitForInitialValue()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:42 +0xdb
  vitess.io/vitess/go/vt/topo/test.checkWatch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:140 +0x414
  vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:116 +0x564
  vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47
==================

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
--- FAIL: TestMemoryTopo (0.06s)
    testing.go:49: === checkKeyspace
    testing.go:54: === checkShard
    testing.go:59: === checkTablet
    testing.go:64: === checkShardReplication
    testing.go:69: === checkSrvKeyspace
    testing.go:74: === checkSrvVSchema
    testing.go:79: === checkLock
    lock.go:49: ===      checkLockTimeout
    lock.go:52: ===      checkLockMissing
    lock.go:55: ===      checkLockUnblocks
    testing.go:84: === checkVSchema
    testing.go:89: === checkRoutingRules
    testing.go:94: === checkElection
    testing.go:99: === checkWaitForNewLeader
    testing.go:104: === checkDirectory
    directory.go:33: ===   checkDirectoryInCell global
    directory.go:41: ===   checkDirectoryInCell test
    testing.go:109: === checkFile
    file.go:33: ===   checkFileInCell global
    file.go:41: ===   checkFileInCell global
    testing.go:114: === checkWatch
    testing.go:119: === checkList
    testing.go:122: === checkWatchRecursive
    testing.go:1312: race detected during execution of test
FAIL
FAIL	vitess.io/vitess/go/vt/topo/memorytopo	0.087s
```

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
@dbussink dbussink requested review from deepthi and rafael as code owners August 8, 2022 08:03
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Aug 8, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@dbussink dbussink changed the title Avoid race condition in watch shutdown Avoid race condition in memory topo watch shutdown Aug 8, 2022
@deepthi deepthi merged commit 522694e into vitessio:main Aug 8, 2022
@deepthi deepthi deleted the dbussink/avoid-race-memory-topo-close branch August 8, 2022 16:07
systay pushed a commit to planetscale/vitess that referenced this pull request Aug 19, 2022
When we call close we want to lock around clearing out the factory
object. In the watch goroutine shutdown we want to grab a reference to
the factory to ensure that it can keep going with the shutdown
procedure.

It's still possible to panic here on wrong usage but that's deliberate
to avoid reuse of the memory topo.

Found with the race detector on a build:

```
==================
WARNING: DATA RACE
Write at 0x00c00047c780 by goroutine 19:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/memorytopo.go:145 +0x30
  vitess.io/vitess/go/vt/topo.(*StatsConn).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:201 +0x17e
  vitess.io/vitess/go/vt/topo.(*Server).Close()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/server.go:335 +0x25a
  vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:125 +0x605
  vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47

Previous read at 0x00c00047c780 by goroutine 32:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:58 +0xbc

Goroutine 19 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x724
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1839 +0x99
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.runTests()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1837 +0x7e4
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1719 +0xa71
  main.main()
      _testmain.go:47 +0x2e4

Goroutine 32 (finished) created at:
  vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/watch.go:53 +0x564
  vitess.io/vitess/go/vt/topo.(*StatsConn).Watch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/stats_conn.go:168 +0x201
  vitess.io/vitess/go/vt/topo/test.waitForInitialValue()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:42 +0xdb
  vitess.io/vitess/go/vt/topo/test.checkWatch()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/watch.go:140 +0x414
  vitess.io/vitess/go/vt/topo/test.TopoServerTestSuite()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/test/testing.go:116 +0x564
  vitess.io/vitess/go/vt/topo/memorytopo.TestMemoryTopo()
      /home/runner/work/vitess-private/vitess-private/go/vt/topo/memorytopo/server_test.go:28 +0x35
  testing.tRunner()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.18.4/x64/src/testing/testing.go:1486 +0x47
==================

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
--- FAIL: TestMemoryTopo (0.06s)
    testing.go:49: === checkKeyspace
    testing.go:54: === checkShard
    testing.go:59: === checkTablet
    testing.go:64: === checkShardReplication
    testing.go:69: === checkSrvKeyspace
    testing.go:74: === checkSrvVSchema
    testing.go:79: === checkLock
    lock.go:49: ===      checkLockTimeout
    lock.go:52: ===      checkLockMissing
    lock.go:55: ===      checkLockUnblocks
    testing.go:84: === checkVSchema
    testing.go:89: === checkRoutingRules
    testing.go:94: === checkElection
    testing.go:99: === checkWaitForNewLeader
    testing.go:104: === checkDirectory
    directory.go:33: ===   checkDirectoryInCell global
    directory.go:41: ===   checkDirectoryInCell test
    testing.go:109: === checkFile
    file.go:33: ===   checkFileInCell global
    file.go:41: ===   checkFileInCell global
    testing.go:114: === checkWatch
    testing.go:119: === checkList
    testing.go:122: === checkWatchRecursive
    testing.go:1312: race detected during execution of test
FAIL
FAIL	vitess.io/vitess/go/vt/topo/memorytopo	0.087s
```

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
notfelineit pushed a commit to planetscale/vitess that referenced this pull request Sep 21, 2022
* Revert "Add explicit close state to memory topo connection (vitessio#11110) (vitessio#1016)"

This reverts commit eb1e9c2.

* Revert "Fix races in memory topo and watcher (vitessio#11065) (vitessio#995)"

This reverts commit 6bc0171.

* Revert "Avoid race condition in watch shutdown (vitessio#10954) (vitessio#936)"

This reverts commit 23d4e34.

* Revert "Remove potential double close of channel (vitessio#10929) (vitessio#921)"

This reverts commit 0121e5d.

* Revert "Cherry pick topo improvements (vitessio#10906) (vitessio#916)"

This reverts commit 8c9f56d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants