-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve topo handling and add additional functionality #10906
Conversation
This adds a List() implementation for the memory topo and adds a bunch of tests for this function as well. These tests are only run when the implementation indicates proper support for List. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
A function that can spawn a goroutine should use the passed in context for that and not return a cancel function itself. That is considered an anti-pattern, so let's change this all to more idiomatic Go patterns. This also ensures we can now set a proper timeout on the initial data retrieval for a watch using the configured topo timeout. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
This adds a new function to the topo interface which allows for recursive watching on the topo. Recursive watching can be used to significantly improve performance in a few cases where we want to monitor changes across a prefix. There's no immediate usage added yet, that will be done in follow up changes. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Once a context is expired, we shouldn't return a valid cell anymore, even if it's the global cell. Local cells would fail in this case as well, so this ensures we match that behavior. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
This allows for more efficiently waiting for a new leader to be elected. It also allows for a continued stream of changes during the election lifetime. This is very useful for example when something else needs to be triggered on an election change and we don't need to use a busy wait loop for it in that case. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to @dbussink's explanation, these are changes that we've already tested internally with more stringent tests than the ones we're performing in Vitess right now, including running topology servers in end2end and integration tests and doing a Goroutine check at the end of the test to ensure that the whole server has exited cleanly.
So far so good 👌
Also when we get an error directly from the watcher on the initial entry, we want to update the tracking value with that error. This is needed to ensure we handle errors like a file not existing and appearing later properly. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
results = append(results, topo.KVInfo{ | ||
Key: []byte(node.Data.Key), | ||
Value: []byte(node.Data.Value), | ||
Value: out, | ||
Version: KubernetesVersion(node.GetResourceVersion()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattlord FYI since it looks like your name is on the git blame
, but there were 3 separate bugs here in the K8s topo List
implementation. It wasn't checking against the correct path prefix (the root
was ignored), it was checking against the .Value
instead of the .Key
and the .Value
was not decompressed to match that compression is transparent.
This PR added some basic tests for List
that uncovered all those issues. It does make me wonder if anyone is really using the K8s topo if there's bugs like this in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you! I'm also not sure if anyone is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure either. It was added by @carsonoid who may be able to tell us. or @derekperkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only comment from my side.
Watch(ctx context.Context, filePath string) (current *WatchData, changes <-chan *WatchData, err error) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a breaking change requiring release-notes changes. My understanding is that the users can have their own implementation for the topo server and therefore any change to the interface is breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to worry about this. Unlike vindexes, we don't expect custom topo server implementations.
The interface is exported because it has to be, but that doesn't mean that every exported interface is part of the "public" interface for the vitess codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR with lots of good fixes.
- Changes to
Watch
should be fine. - The quibble with
WatchRecursive
is that it is currently only implemented foretcd
. It's good that you documented how it can be implemented for the other flavors. Since this is not yet being used in Vitess, we'll simply need to get the other flavors implemented whenever we want to start usingWatchRecursive
(for example in vtgate's healthcheck) - The one remaining question is regarding
List
. Currently it is a strict prefix check, so given/toplevel/nested/myfile
,List("/top")
matches and returns it versus assuming thattop
is a "directory". Looking at the comment on the interface, that seems intentional, so we should leave it as-is unless @mattlord thinks we should change it.
results = append(results, topo.KVInfo{ | ||
Key: []byte(node.Data.Key), | ||
Value: []byte(node.Data.Value), | ||
Value: out, | ||
Version: KubernetesVersion(node.GetResourceVersion()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure either. It was added by @carsonoid who may be able to tell us. or @derekperkins
Watch(ctx context.Context, filePath string) (current *WatchData, changes <-chan *WatchData, err error) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to worry about this. Unlike vindexes, we don't expect custom topo server implementations.
The interface is exported because it has to be, but that doesn't mean that every exported interface is part of the "public" interface for the vitess codebase.
I believe the backup test failure has been fixed via #10895. Merging.. |
* Add List() implementation for memory topo This adds a List() implementation for the memory topo and adds a bunch of tests for this function as well. These tests are only run when the implementation indicates proper support for List. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Refactor Watch() to use context properly A function that can spawn a goroutine should use the passed in context for that and not return a cancel function itself. That is considered an anti-pattern, so let's change this all to more idiomatic Go patterns. This also ensures we can now set a proper timeout on the initial data retrieval for a watch using the configured topo timeout. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Add recursive watcher for topo This adds a new function to the topo interface which allows for recursive watching on the topo. Recursive watching can be used to significantly improve performance in a few cases where we want to monitor changes across a prefix. There's no immediate usage added yet, that will be done in follow up changes. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Don't allow expired contexts to retrieve topo connection Once a context is expired, we shouldn't return a valid cell anymore, even if it's the global cell. Local cells would fail in this case as well, so this ensures we match that behavior. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Add WaitForNewLeader to election API This allows for more efficiently waiting for a new leader to be elected. It also allows for a continued stream of changes during the election lifetime. This is very useful for example when something else needs to be triggered on an election change and we don't need to use a busy wait loop for it in that case. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Update entries also when receiving direct errors Also when we get an error directly from the watcher on the initial entry, we want to update the tracking value with that error. This is needed to ensure we handle errors like a file not existing and appearing later properly. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix broken k8stopo List implementation Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Add back still used code Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix import Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Right now we pass in the context when starting a Watch that is also used for the request context. This means that the Watch ends up being cancelled when the original request that started it as a side effects ends up completing and cancels the context to clean up. This is of course not as intended. Before the refactor in vitessio#10906 this wasn't causing a practical issue yet. We'd still have the expired context internally in the watcher and it would be passed through with updating entries, but there were no calls that ended up validating the context expiry, avoiding any immediate issue. This is bound to fail though at some point if something would be added that does care about the context. What is needed is that the watcher we start sets up it's own context based on the background context since it is detached from the original request that might trigger starting the watcher as a side effect. Additionally, it means that the tracked context for an error isn't really useful. It would often be an already cancelled context from a mostly unrelated request which doesn't provide useful information. Even more so, it would keep a reference to that context so it would never be garbage collection potentially and would keep more request data alive than necessary. With the fix, the context is always from the background context with a cancel on top for that watcher. This isn't very useful either. Also we don't use this context tracking for any error messaging or reporting anywhere, so I believe it's better to clean up this tracking. By cleaning up that tracking, we also avoid the need to pass down the context in entry updates and that is all cleaned up here as well. Lastly, a failing test is introduced that verifies the original issue. It retrieves serving keyspace information, cancels the original request that triggered that and then validates the watcher is still running by updating the value again within the timeout window. This failed before this fix as the watcher would be cancelled and the cached old value was returned before the TTL expired. The main problem of this bug is not an issue of correctness, but of a serious performance degration in the vtgate. Each second we'd restart context setup if we ever had a failure on the path triggered by regular queries and the system would not recover from this situation and heavily query the topo server and make things very expensive. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
* Fix problematic watch cancellation due to context cancellation Right now we pass in the context when starting a Watch that is also used for the request context. This means that the Watch ends up being cancelled when the original request that started it as a side effects ends up completing and cancels the context to clean up. This is of course not as intended. Before the refactor in #10906 this wasn't causing a practical issue yet. We'd still have the expired context internally in the watcher and it would be passed through with updating entries, but there were no calls that ended up validating the context expiry, avoiding any immediate issue. This is bound to fail though at some point if something would be added that does care about the context. What is needed is that the watcher we start sets up it's own context based on the background context since it is detached from the original request that might trigger starting the watcher as a side effect. Additionally, it means that the tracked context for an error isn't really useful. It would often be an already cancelled context from a mostly unrelated request which doesn't provide useful information. Even more so, it would keep a reference to that context so it would never be garbage collection potentially and would keep more request data alive than necessary. With the fix, the context is always from the background context with a cancel on top for that watcher. This isn't very useful either. Also we don't use this context tracking for any error messaging or reporting anywhere, so I believe it's better to clean up this tracking. By cleaning up that tracking, we also avoid the need to pass down the context in entry updates and that is all cleaned up here as well. Lastly, a failing test is introduced that verifies the original issue. It retrieves serving keyspace information, cancels the original request that triggered that and then validates the watcher is still running by updating the value again within the timeout window. This failed before this fix as the watcher would be cancelled and the cached old value was returned before the TTL expired. The main problem of this bug is not an issue of correctness, but of a serious performance degration in the vtgate. Each second we'd restart context setup if we ever had a failure on the path triggered by regular queries and the system would not recover from this situation and heavily query the topo server and make things very expensive. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Improve handling of retries and timer wait The timer here can stay around if other events fire first, so we want to use an explicit timer to stop it immediately when we know it completes. Additionally, because of binding issues, watchCancel() would not rebind if we start a new inner watcher. Therefore this adds back an outer context that we can cancel in a defer to we know for sure we cancel things properly when stopping the watcher. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix leak in etc2topo tests We never closed the `cli` instance here so it would linger until the process completes. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Remove unused context Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
* Revert "Add explicit close state to memory topo connection (vitessio#11110) (vitessio#1016)" This reverts commit eb1e9c2. * Revert "Fix races in memory topo and watcher (vitessio#11065) (vitessio#995)" This reverts commit 6bc0171. * Revert "Avoid race condition in watch shutdown (vitessio#10954) (vitessio#936)" This reverts commit 23d4e34. * Revert "Remove potential double close of channel (vitessio#10929) (vitessio#921)" This reverts commit 0121e5d. * Revert "Cherry pick topo improvements (vitessio#10906) (vitessio#916)" This reverts commit 8c9f56d.
The change here adds specific additional functionality to the topo. It adds
WatchRecursive
which allows for watching a specific prefix or directory which means we can more efficiently monitor changes in the topo without having to busy loop, or without having to setup multiple watchers.It also adds a more efficient way to track leadership changes which under the hood work in a similar way.
Lastly, this series of changes starts with a refactor on
Watch
to make it more idiomatic Go and better match the addedWatchRecursive
.The changes are best reviewed commit by commit here as they build it up incrementally.
Nothing immediately starts using these new methods here, that is something to be added separately.
Checklist