Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes synchronicity between transactions/updates #254

Merged
merged 2 commits into from
Oct 25, 2021

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Oct 21, 2021

The changes to use a buffered channel to handle updates broken the
synchronization ovsdb depends on with txns. This is because OVSDB
protocol specifies an update will be sent before a txn reply. We depend
on this to ensure our cache is valid when we complete a txn.

This patch reverts the changes from
cda509c and while the monitor is being
setup, defers updates to a buffer. After the monitor setup is complete,
the defered updates will be processed in order. Afterwards, the db
cacheMutex is unlocked so that subsequent updates maybe processed
normally.

Signed-off-by: Tim Rozet trozet@redhat.com

@trozet
Copy link
Contributor Author

trozet commented Oct 21, 2021

/assign @dave-tucker

@trozet
Copy link
Contributor Author

trozet commented Oct 21, 2021

testing in upstream ovnkube: ovn-kubernetes/ovn-kubernetes#2599

@coveralls
Copy link

coveralls commented Oct 21, 2021

Pull Request Test Coverage Report for Build 1381829607

  • 35 of 96 (36.46%) changed or added relevant lines in 2 files are covered.
  • 10 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.7%) to 72.4%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cache/cache.go 3 20 15.0%
client/client.go 32 76 42.11%
Files with Coverage Reduction New Missed Lines %
client/client.go 3 63.4%
cache/cache.go 7 69.69%
Totals Coverage Status
Change from base Build 1380832416: -0.7%
Covered Lines: 4108
Relevant Lines: 5674

💛 - Coveralls

client/client.go Show resolved Hide resolved
client/client.go Show resolved Hide resolved
client/client.go Show resolved Hide resolved
client/client.go Show resolved Hide resolved
@dave-tucker
Copy link
Collaborator

@trozet there appear to still be panics in cache.go that can be changed to just return fmt.Errorf() with some more info about what happened.

@dave-tucker
Copy link
Collaborator

from that log, it's clear that we got an modify type update for a row that doesn't exist, but I'm curious as to how that can be 🤔

client/client.go Outdated Show resolved Hide resolved
@trozet trozet force-pushed the fix_handling_cache branch from 6db1815 to 33bbf40 Compare October 25, 2021 14:32
The changes to use a buffered channel to handle updates broken the
synchronization ovsdb depends on with txns. This is because OVSDB
protocol specifies an update will be sent before a txn reply. We depend
on this to ensure our cache is valid when we complete a txn.

This patch reverts the changes from
cda509c and while the monitor is being
setup, defers updates to a buffer. After the monitor setup is complete,
the defered updates will be processed in order. Afterwards, the db
cacheMutex is unlocked so that subsequent updates maybe processed
normally.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet trozet force-pushed the fix_handling_cache branch from 33bbf40 to d11896c Compare October 25, 2021 14:39
Copy link
Collaborator

@jcaamano jcaamano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought.

We might need to deferUpdates anytime we are setting up a monitor and not only on reconnect. If we monitor a new table well after connect then we might start receiving updates for that table before its initial state?

Previously failures to populate the cache would result in panic. This
should be avoided where possible and error handling should take place to
reconnect and rebuild the cache if the client and server become out of
sync. This change fixes the remaining panics to return errors, and
errors are sent to signal a reconnect and cache flush. Additionally,
errors are propagated up to the RPC handler and error responses may be
sent. Currently these responses may race with client disconnect, but at
least its a step in the right direction.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet trozet requested a review from dave-tucker October 25, 2021 15:42
Copy link
Collaborator

@dave-tucker dave-tucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @trozet this is looking really good 👌

@dave-tucker dave-tucker merged commit 80be4ac into ovn-org:main Oct 25, 2021
@dave-tucker dave-tucker added the fix label Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants