Fixes synchronicity between transactions/updates #254

trozet · 2021-10-21T22:23:08Z

The changes to use a buffered channel to handle updates broken the
synchronization ovsdb depends on with txns. This is because OVSDB
protocol specifies an update will be sent before a txn reply. We depend
on this to ensure our cache is valid when we complete a txn.

This patch reverts the changes from
cda509c and while the monitor is being
setup, defers updates to a buffer. After the monitor setup is complete,
the defered updates will be processed in order. Afterwards, the db
cacheMutex is unlocked so that subsequent updates maybe processed
normally.

Signed-off-by: Tim Rozet trozet@redhat.com

trozet · 2021-10-21T22:23:58Z

/assign @dave-tucker

trozet · 2021-10-21T22:24:14Z

testing in upstream ovnkube: ovn-kubernetes/ovn-kubernetes#2599

coveralls · 2021-10-21T22:25:17Z

Pull Request Test Coverage Report for Build 1381829607

35 of 96 (36.46%) changed or added relevant lines in 2 files are covered.
10 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.7%) to 72.4%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
cache/cache.go	3	20	15.0%
client/client.go	32	76	42.11%

Files with Coverage Reduction	New Missed Lines	%
client/client.go	3	63.4%
cache/cache.go	7	69.69%

Totals
Change from base Build 1380832416:	-0.7%
Covered Lines:	4108
Relevant Lines:	5674

💛 - Coveralls

client/client.go

trozet · 2021-10-22T17:22:03Z

Looks like there is still some crash:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/796/pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn/1451531622473011200/artifacts/e2e-gcp-ovn/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-master-5xvfr_ovnkube-master_previous.log

dave-tucker · 2021-10-22T17:55:58Z

@trozet there appear to still be panics in cache.go that can be changed to just return fmt.Errorf() with some more info about what happened.

dave-tucker · 2021-10-22T17:56:45Z

from that log, it's clear that we got an modify type update for a row that doesn't exist, but I'm curious as to how that can be 🤔

client/client.go

The changes to use a buffered channel to handle updates broken the synchronization ovsdb depends on with txns. This is because OVSDB protocol specifies an update will be sent before a txn reply. We depend on this to ensure our cache is valid when we complete a txn. This patch reverts the changes from cda509c and while the monitor is being setup, defers updates to a buffer. After the monitor setup is complete, the defered updates will be processed in order. Afterwards, the db cacheMutex is unlocked so that subsequent updates maybe processed normally. Signed-off-by: Tim Rozet <trozet@redhat.com>

jcaamano

Another thought.

We might need to deferUpdates anytime we are setting up a monitor and not only on reconnect. If we monitor a new table well after connect then we might start receiving updates for that table before its initial state?

Previously failures to populate the cache would result in panic. This should be avoided where possible and error handling should take place to reconnect and rebuild the cache if the client and server become out of sync. This change fixes the remaining panics to return errors, and errors are sent to signal a reconnect and cache flush. Additionally, errors are propagated up to the RPC handler and error responses may be sent. Currently these responses may race with client disconnect, but at least its a step in the right direction. Signed-off-by: Tim Rozet <trozet@redhat.com>

dave-tucker

LGTM. Thanks @trozet this is looking really good 👌

trozet mentioned this pull request Oct 22, 2021

[DownstreamMerge] Merge 2021-10-13 openshift/ovn-kubernetes#796

Merged

dave-tucker requested changes Oct 22, 2021

View reviewed changes

client/client.go Show resolved Hide resolved

client/client.go Show resolved Hide resolved

client/client.go Show resolved Hide resolved

client/client.go Show resolved Hide resolved

jcaamano reviewed Oct 25, 2021

View reviewed changes

client/client.go Show resolved Hide resolved

jcaamano reviewed Oct 25, 2021

View reviewed changes

client/client.go Outdated Show resolved Hide resolved

trozet force-pushed the fix_handling_cache branch from 6db1815 to 33bbf40 Compare October 25, 2021 14:32

trozet force-pushed the fix_handling_cache branch from 33bbf40 to d11896c Compare October 25, 2021 14:39

jcaamano reviewed Oct 25, 2021

View reviewed changes

trozet requested a review from dave-tucker October 25, 2021 15:42

dave-tucker approved these changes Oct 25, 2021

View reviewed changes

dave-tucker merged commit 80be4ac into ovn-org:main Oct 25, 2021

dave-tucker mentioned this pull request Oct 26, 2021

Refactor: Reduce Use Of Pointers #256

Merged

dave-tucker added the fix label Nov 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes synchronicity between transactions/updates #254

Fixes synchronicity between transactions/updates #254

trozet commented Oct 21, 2021

trozet commented Oct 21, 2021

trozet commented Oct 21, 2021

coveralls commented Oct 21, 2021 •

edited

Loading

trozet commented Oct 22, 2021

dave-tucker commented Oct 22, 2021

dave-tucker commented Oct 22, 2021

jcaamano left a comment

dave-tucker left a comment

Fixes synchronicity between transactions/updates #254

Fixes synchronicity between transactions/updates #254

Conversation

trozet commented Oct 21, 2021

trozet commented Oct 21, 2021

trozet commented Oct 21, 2021

coveralls commented Oct 21, 2021 • edited Loading

Pull Request Test Coverage Report for Build 1381829607

💛 - Coveralls

trozet commented Oct 22, 2021

dave-tucker commented Oct 22, 2021

dave-tucker commented Oct 22, 2021

jcaamano left a comment

Choose a reason for hiding this comment

dave-tucker left a comment

Choose a reason for hiding this comment

coveralls commented Oct 21, 2021 •

edited

Loading