Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve topo handling and add additional functionality #10906

Merged
merged 7 commits into from
Aug 2, 2022

Conversation

dbussink
Copy link
Contributor

@dbussink dbussink commented Aug 1, 2022

The change here adds specific additional functionality to the topo. It adds WatchRecursive which allows for watching a specific prefix or directory which means we can more efficiently monitor changes in the topo without having to busy loop, or without having to setup multiple watchers.

It also adds a more efficient way to track leadership changes which under the hood work in a similar way.

Lastly, this series of changes starts with a refactor on Watch to make it more idiomatic Go and better match the added WatchRecursive.

The changes are best reviewed commit by commit here as they build it up incrementally.

Nothing immediately starts using these new methods here, that is something to be added separately.

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

This adds a List() implementation for the memory topo and adds a bunch
of tests for this function as well. These tests are only run when the
implementation indicates proper support for List.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
A function that can spawn a goroutine should use the passed in context
for that and not return a cancel function itself. That is considered an
anti-pattern, so let's change this all to more idiomatic Go patterns.

This also ensures we can now set a proper timeout on the initial data
retrieval for a watch using the configured topo timeout.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
This adds a new function to the topo interface which allows for
recursive watching on the topo.

Recursive watching can be used to significantly improve performance in a
few cases where we want to monitor changes across a prefix.

There's no immediate usage added yet, that will be done in follow up
changes.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Once a context is expired, we shouldn't return a valid cell anymore,
even if it's the global cell.

Local cells would fail in this case as well, so this ensures we match
that behavior.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
This allows for more efficiently waiting for a new leader to be elected.
It also allows for a continued stream of changes during the election
lifetime.

This is very useful for example when something else needs to be
triggered on an election change and we don't need to use a busy wait
loop for it in that case.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
@dbussink dbussink requested a review from vmg August 1, 2022 14:18
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Aug 1, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to @dbussink's explanation, these are changes that we've already tested internally with more stringent tests than the ones we're performing in Vitess right now, including running topology servers in end2end and integration tests and doing a Goroutine check at the end of the test to ensure that the whole server has exited cleanly.

So far so good 👌

@dbussink dbussink added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: TabletManager labels Aug 1, 2022
Also when we get an error directly from the watcher on the initial
entry, we want to update the tracking value with that error. This is
needed to ensure we handle errors like a file not existing and appearing
later properly.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
results = append(results, topo.KVInfo{
Key: []byte(node.Data.Key),
Value: []byte(node.Data.Value),
Value: out,
Version: KubernetesVersion(node.GetResourceVersion()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattlord FYI since it looks like your name is on the git blame, but there were 3 separate bugs here in the K8s topo List implementation. It wasn't checking against the correct path prefix (the root was ignored), it was checking against the .Value instead of the .Key and the .Value was not decompressed to match that compression is transparent.

This PR added some basic tests for List that uncovered all those issues. It does make me wonder if anyone is really using the K8s topo if there's bugs like this in there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! I'm also not sure if anyone is using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure either. It was added by @carsonoid who may be able to tell us. or @derekperkins

@dbussink dbussink requested a review from mattlord August 2, 2022 07:42
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only comment from my side.

Comment on lines +156 to +157
Watch(ctx context.Context, filePath string) (current *WatchData, changes <-chan *WatchData, err error)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a breaking change requiring release-notes changes. My understanding is that the users can have their own implementation for the topo server and therefore any change to the interface is breaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about this. Unlike vindexes, we don't expect custom topo server implementations.
The interface is exported because it has to be, but that doesn't mean that every exported interface is part of the "public" interface for the vitess codebase.

@vmg vmg added the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Aug 2, 2022
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR with lots of good fixes.

  • Changes to Watch should be fine.
  • The quibble with WatchRecursive is that it is currently only implemented for etcd. It's good that you documented how it can be implemented for the other flavors. Since this is not yet being used in Vitess, we'll simply need to get the other flavors implemented whenever we want to start using WatchRecursive (for example in vtgate's healthcheck)
  • The one remaining question is regarding List. Currently it is a strict prefix check, so given /toplevel/nested/myfile, List("/top") matches and returns it versus assuming that top is a "directory". Looking at the comment on the interface, that seems intentional, so we should leave it as-is unless @mattlord thinks we should change it.

results = append(results, topo.KVInfo{
Key: []byte(node.Data.Key),
Value: []byte(node.Data.Value),
Value: out,
Version: KubernetesVersion(node.GetResourceVersion()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure either. It was added by @carsonoid who may be able to tell us. or @derekperkins

Comment on lines +156 to +157
Watch(ctx context.Context, filePath string) (current *WatchData, changes <-chan *WatchData, err error)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about this. Unlike vindexes, we don't expect custom topo server implementations.
The interface is exported because it has to be, but that doesn't mean that every exported interface is part of the "public" interface for the vitess codebase.

@deepthi deepthi removed the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Aug 2, 2022
@deepthi
Copy link
Member

deepthi commented Aug 2, 2022

I believe the backup test failure has been fixed via #10895. Merging..

@deepthi deepthi merged commit 853d88d into vitessio:main Aug 2, 2022
@deepthi deepthi deleted the dbussink/improve-topo branch August 2, 2022 23:01
systay pushed a commit to planetscale/vitess that referenced this pull request Aug 19, 2022
* Add List() implementation for memory topo

This adds a List() implementation for the memory topo and adds a bunch
of tests for this function as well. These tests are only run when the
implementation indicates proper support for List.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Refactor Watch() to use context properly

A function that can spawn a goroutine should use the passed in context
for that and not return a cancel function itself. That is considered an
anti-pattern, so let's change this all to more idiomatic Go patterns.

This also ensures we can now set a proper timeout on the initial data
retrieval for a watch using the configured topo timeout.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Add recursive watcher for topo

This adds a new function to the topo interface which allows for
recursive watching on the topo.

Recursive watching can be used to significantly improve performance in a
few cases where we want to monitor changes across a prefix.

There's no immediate usage added yet, that will be done in follow up
changes.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Don't allow expired contexts to retrieve topo connection

Once a context is expired, we shouldn't return a valid cell anymore,
even if it's the global cell.

Local cells would fail in this case as well, so this ensures we match
that behavior.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Add WaitForNewLeader to election API

This allows for more efficiently waiting for a new leader to be elected.
It also allows for a continued stream of changes during the election
lifetime.

This is very useful for example when something else needs to be
triggered on an election change and we don't need to use a busy wait
loop for it in that case.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Update entries also when receiving direct errors

Also when we get an error directly from the watcher on the initial
entry, we want to update the tracking value with that error. This is
needed to ensure we handle errors like a file not existing and appearing
later properly.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Fix broken k8stopo List implementation

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Add back still used code

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Fix import

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
dbussink added a commit to dbussink/vitess that referenced this pull request Sep 2, 2022
Right now we pass in the context when starting a Watch that is also
used for the request context. This means that the Watch ends up being
cancelled when the original request that started it as a side effects
ends up completing and cancels the context to clean up.

This is of course not as intended. Before the refactor in
vitessio#10906 this wasn't causing a
practical issue yet. We'd still have the expired context internally in
the watcher and it would be passed through with updating entries, but
there were no calls that ended up validating the context expiry,
avoiding any immediate issue.

This is bound to fail though at some point if something would be
added that does care about the context. What is needed is that the
watcher we start sets up it's own context based on the background
context since it is detached from the original request that might
trigger starting the watcher as a side effect.

Additionally, it means that the tracked context for an error isn't
really useful. It would often be an already cancelled context from a
mostly unrelated request which doesn't provide useful information. Even
more so, it would keep a reference to that context so it would never be
garbage collection potentially and would keep more request data alive
than necessary.

With the fix, the context is always from the background context with a
cancel on top for that watcher. This isn't very useful either. Also we
don't use this context tracking for any error messaging or reporting
anywhere, so I believe it's better to clean up this tracking.

By cleaning up that tracking, we also avoid the need to pass down the
context in entry updates and that is all cleaned up here as well.

Lastly, a failing test is introduced that verifies the original issue.
It retrieves serving keyspace information, cancels the original request
that triggered that and then validates the watcher is still running by
updating the value again within the timeout window. This failed before
this fix as the watcher would be cancelled and the cached old value was
returned before the TTL expired.

The main problem of this bug is not an issue of correctness, but of a
serious performance degration in the vtgate. Each second we'd restart
context setup if we ever had a failure on the path triggered by regular
queries and the system would not recover from this situation and heavily
query the topo server and make things very expensive.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
mattlord pushed a commit that referenced this pull request Sep 6, 2022
* Fix problematic watch cancellation due to context cancellation

Right now we pass in the context when starting a Watch that is also
used for the request context. This means that the Watch ends up being
cancelled when the original request that started it as a side effects
ends up completing and cancels the context to clean up.

This is of course not as intended. Before the refactor in
#10906 this wasn't causing a
practical issue yet. We'd still have the expired context internally in
the watcher and it would be passed through with updating entries, but
there were no calls that ended up validating the context expiry,
avoiding any immediate issue.

This is bound to fail though at some point if something would be
added that does care about the context. What is needed is that the
watcher we start sets up it's own context based on the background
context since it is detached from the original request that might
trigger starting the watcher as a side effect.

Additionally, it means that the tracked context for an error isn't
really useful. It would often be an already cancelled context from a
mostly unrelated request which doesn't provide useful information. Even
more so, it would keep a reference to that context so it would never be
garbage collection potentially and would keep more request data alive
than necessary.

With the fix, the context is always from the background context with a
cancel on top for that watcher. This isn't very useful either. Also we
don't use this context tracking for any error messaging or reporting
anywhere, so I believe it's better to clean up this tracking.

By cleaning up that tracking, we also avoid the need to pass down the
context in entry updates and that is all cleaned up here as well.

Lastly, a failing test is introduced that verifies the original issue.
It retrieves serving keyspace information, cancels the original request
that triggered that and then validates the watcher is still running by
updating the value again within the timeout window. This failed before
this fix as the watcher would be cancelled and the cached old value was
returned before the TTL expired.

The main problem of this bug is not an issue of correctness, but of a
serious performance degration in the vtgate. Each second we'd restart
context setup if we ever had a failure on the path triggered by regular
queries and the system would not recover from this situation and heavily
query the topo server and make things very expensive.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Improve handling of retries and timer wait

The timer here can stay around if other events fire first, so we want to
use an explicit timer to stop it immediately when we know it completes.

Additionally, because of binding issues, watchCancel() would not rebind
if we start a new inner watcher. Therefore this adds back an outer
context that we can cancel in a defer to we know for sure we cancel
things properly when stopping the watcher.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Fix leak in etc2topo tests

We never closed the `cli` instance here so it would linger until the
process completes.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Remove unused context

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
notfelineit pushed a commit to planetscale/vitess that referenced this pull request Sep 21, 2022
* Revert "Add explicit close state to memory topo connection (vitessio#11110) (vitessio#1016)"

This reverts commit eb1e9c2.

* Revert "Fix races in memory topo and watcher (vitessio#11065) (vitessio#995)"

This reverts commit 6bc0171.

* Revert "Avoid race condition in watch shutdown (vitessio#10954) (vitessio#936)"

This reverts commit 23d4e34.

* Revert "Remove potential double close of channel (vitessio#10929) (vitessio#921)"

This reverts commit 0121e5d.

* Revert "Cherry pick topo improvements (vitessio#10906) (vitessio#916)"

This reverts commit 8c9f56d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: TabletManager Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants