Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: don't let RowByModel() recursively lock the RowCache #270

Merged
merged 1 commit into from
Dec 16, 2021

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented Dec 16, 2021

RWLocks are not supposed to be used recursively, and if something
else Locks the row cache then we deadlock.

In this example, goroutine 59 RLocks the RowCache, and then
goroutine 174 attempts to Lock the cache but blocks on #59's
RLock. Since there is a writer waiting to acquire the lock,
any further RLocks are effectively Locks, and thus when
goroutine #59 attempts to acquire the second RLock in Row()
it deadlocks against goroutine #174.

goroutine 59 [semacquire, 12 minutes]:
sync.runtime_SemacquireMutex(0xc000bf2e2c, 0xc0025a3800, 0x0)
	/usr/lib/golang/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).RLock(...)
	/usr/lib/golang/src/sync/rwmutex.go:63
github.com/ovn-org/libovsdb/cache.(*RowCache).Row(0xc000bf2d80, 0xc00342de60, 0x24, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:101 +0x139
github.com/ovn-org/libovsdb/cache.(*RowCache).RowByModel(0xc000bf2d80, 0x1a8e8e0, 0xc0046c0000, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:130 +0x417
github.com/ovn-org/libovsdb/client.api.Get(0xc000bf3680, 0x0, 0x0, 0xc000183f98, 0x21135a0, 0xc003394600, 0x1a8e8e0, 0xc0046c0000, 0xc00416e300, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/api.go:221 +0xd0
github.com/ovn-org/libovsdb/client.(*ovsdbClient).Get(0xc0001c48f0, 0x21135a0, 0xc003394600, 0x1a8e8e0, 0xc0046c0000, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/client.go:1189 +0x139
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.ovnNBLSPDel(0x2138198, 0xc00002d640, 0xc0038731b8, 0x17, 0xc002501b60, 0x29, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/pods.go:649 +0x265
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).deleteLogicalPort(0xc00000e1e0, 0xc002f40400)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/pods.go:145 +0x46a

goroutine 174 [semacquire, 12 minutes]:
sync.runtime_SemacquireMutex(0xc000bf2e28, 0x4f1600, 0x0)
	/usr/lib/golang/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).Lock(0xc000bf2e20)
	/usr/lib/golang/src/sync/rwmutex.go:116 +0x85
github.com/ovn-org/libovsdb/cache.(*RowCache).Update(0xc000bf2d80, 0xc002280570, 0x24, 0x1a8e8e0, 0xc0046c0600, 0xc003319200, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:176 +0x6f
github.com/ovn-org/libovsdb/cache.(*TableCache).Populate2(0xc000bf3680, 0xc003766210, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:659 +0x7d6
github.com/ovn-org/libovsdb/cache.(*TableCache).Update2(0xc000bf3680, 0x1c71a40, 0xc003331980, 0xc003766210, 0x2e, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:536 +0x6b
github.com/ovn-org/libovsdb/client.(*ovsdbClient).update3(0xc0001c48f0, 0xc00451a660, 0x3, 0x4, 0xc003e32738, 0x7f225dc865b8, 0x41b69b)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/client.go:637 +0x55a
github.com/ovn-org/libovsdb/client.(*ovsdbClient).createRPC2Client.func4(0xc000289500, 0xc00451a660, 0x3, 0x4, 0xc003e32738, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/client.go:395 +0x52

@coveralls
Copy link

coveralls commented Dec 16, 2021

Pull Request Test Coverage Report for Build 1586017676

  • 8 of 8 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 73.08%

Totals Coverage Status
Change from base Build 1580003065: 0.01%
Covered Lines: 4224
Relevant Lines: 5780

💛 - Coveralls

RWLocks are not supposed to be used recursively, and if something
else Locks the row cache then we deadlock.

In this example, goroutine 59 RLocks the RowCache, and then
goroutine 174 attempts to Lock the cache but blocks on ovn-org#59's
RLock. Since there is a writer waiting to acquire the lock,
any further RLocks are effectively Locks, and thus when
goroutine ovn-org#59 attempts to acquire the second RLock in Row()
it deadlocks against goroutine ovn-org#174.

goroutine 59 [semacquire, 12 minutes]:
sync.runtime_SemacquireMutex(0xc000bf2e2c, 0xc0025a3800, 0x0)
	/usr/lib/golang/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).RLock(...)
	/usr/lib/golang/src/sync/rwmutex.go:63
github.com/ovn-org/libovsdb/cache.(*RowCache).Row(0xc000bf2d80, 0xc00342de60, 0x24, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:101 +0x139
github.com/ovn-org/libovsdb/cache.(*RowCache).RowByModel(0xc000bf2d80, 0x1a8e8e0, 0xc0046c0000, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:130 +0x417
github.com/ovn-org/libovsdb/client.api.Get(0xc000bf3680, 0x0, 0x0, 0xc000183f98, 0x21135a0, 0xc003394600, 0x1a8e8e0, 0xc0046c0000, 0xc00416e300, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/api.go:221 +0xd0
github.com/ovn-org/libovsdb/client.(*ovsdbClient).Get(0xc0001c48f0, 0x21135a0, 0xc003394600, 0x1a8e8e0, 0xc0046c0000, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/client.go:1189 +0x139
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.ovnNBLSPDel(0x2138198, 0xc00002d640, 0xc0038731b8, 0x17, 0xc002501b60, 0x29, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/pods.go:649 +0x265
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).deleteLogicalPort(0xc00000e1e0, 0xc002f40400)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/pods.go:145 +0x46a

goroutine 174 [semacquire, 12 minutes]:
sync.runtime_SemacquireMutex(0xc000bf2e28, 0x4f1600, 0x0)
	/usr/lib/golang/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).Lock(0xc000bf2e20)
	/usr/lib/golang/src/sync/rwmutex.go:116 +0x85
github.com/ovn-org/libovsdb/cache.(*RowCache).Update(0xc000bf2d80, 0xc002280570, 0x24, 0x1a8e8e0, 0xc0046c0600, 0xc003319200, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:176 +0x6f
github.com/ovn-org/libovsdb/cache.(*TableCache).Populate2(0xc000bf3680, 0xc003766210, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:659 +0x7d6
github.com/ovn-org/libovsdb/cache.(*TableCache).Update2(0xc000bf3680, 0x1c71a40, 0xc003331980, 0xc003766210, 0x2e, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/cache/cache.go:536 +0x6b
github.com/ovn-org/libovsdb/client.(*ovsdbClient).update3(0xc0001c48f0, 0xc00451a660, 0x3, 0x4, 0xc003e32738, 0x7f225dc865b8, 0x41b69b)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/client.go:637 +0x55a
github.com/ovn-org/libovsdb/client.(*ovsdbClient).createRPC2Client.func4(0xc000289500, 0xc00451a660, 0x3, 0x4, 0xc003e32738, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/ovn-org/libovsdb/client/client.go:395 +0x52

Signed-off-by: Dan Williams <dcbw@redhat.com>
@dcbw dcbw force-pushed the rowcache-deadlock branch from a41f75f to 249501a Compare December 16, 2021 05:16
@jcaamano
Copy link
Collaborator

/lgtm

Copy link
Collaborator

@dave-tucker dave-tucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dave-tucker dave-tucker merged commit ab69150 into ovn-org:main Dec 16, 2021
dcbw added a commit to dcbw/ovn-kubernetes that referenced this pull request Dec 16, 2021
To pick up ovn-org/libovsdb#270
"cache: don't let RowByModel() recursively lock the RowCache"

Signed-off-by: Dan Williams <dcbw@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants