Improve topo handling and add additional functionality #10906

dbussink · 2022-08-01T14:18:07Z

The change here adds specific additional functionality to the topo. It adds WatchRecursive which allows for watching a specific prefix or directory which means we can more efficiently monitor changes in the topo without having to busy loop, or without having to setup multiple watchers.

It also adds a more efficient way to track leadership changes which under the hood work in a similar way.

Lastly, this series of changes starts with a refactor on Watch to make it more idiomatic Go and better match the added WatchRecursive.

The changes are best reviewed commit by commit here as they build it up incrementally.

Nothing immediately starts using these new methods here, that is something to be added separately.

Checklist

"Backport me!" label has been added if this change should be backported
Tests were added or are not required
Documentation was added or is not required

This adds a List() implementation for the memory topo and adds a bunch of tests for this function as well. These tests are only run when the implementation indicates proper support for List. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

A function that can spawn a goroutine should use the passed in context for that and not return a cancel function itself. That is considered an anti-pattern, so let's change this all to more idiomatic Go patterns. This also ensures we can now set a proper timeout on the initial data retrieval for a watch using the configured topo timeout. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

This adds a new function to the topo interface which allows for recursive watching on the topo. Recursive watching can be used to significantly improve performance in a few cases where we want to monitor changes across a prefix. There's no immediate usage added yet, that will be done in follow up changes. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

Once a context is expired, we shouldn't return a valid cell anymore, even if it's the global cell. Local cells would fail in this case as well, so this ensures we match that behavior. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

This allows for more efficiently waiting for a new leader to be elected. It also allows for a continued stream of changes during the election lifetime. This is very useful for example when something else needs to be triggered on an election change and we don't need to use a busy wait loop for it in that case. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

vitess-bot · 2022-08-01T14:18:09Z

vmg

To add to @dbussink's explanation, these are changes that we've already tested internally with more stringent tests than the ones we're performing in Vitess right now, including running topology servers in end2end and integration tests and doing a Goroutine check at the end of the test to ensure that the whole server has exited cleanly.

So far so good 👌

Also when we get an error directly from the watcher on the initial entry, we want to update the tracking value with that error. This is needed to ensure we handle errors like a file not existing and appearing later properly. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink · 2022-08-02T07:39:13Z

go/vt/topo/k8stopo/file.go

 			results = append(results, topo.KVInfo{
 				Key:     []byte(node.Data.Key),
-				Value:   []byte(node.Data.Value),
+				Value:   out,
 				Version: KubernetesVersion(node.GetResourceVersion()),


@mattlord FYI since it looks like your name is on the git blame, but there were 3 separate bugs here in the K8s topo List implementation. It wasn't checking against the correct path prefix (the root was ignored), it was checking against the .Value instead of the .Key and the .Value was not decompressed to match that compression is transparent.

This PR added some basic tests for List that uncovered all those issues. It does make me wonder if anyone is really using the K8s topo if there's bugs like this in there?

Good catch, thank you! I'm also not sure if anyone is using it.

not sure either. It was added by @carsonoid who may be able to tell us. or @derekperkins

GuptaManan100

This is the only comment from my side.

GuptaManan100 · 2022-08-02T10:00:06Z

go/vt/topo/conn.go

+	Watch(ctx context.Context, filePath string) (current *WatchData, changes <-chan *WatchData, err error)
+


This seems like a breaking change requiring release-notes changes. My understanding is that the users can have their own implementation for the topo server and therefore any change to the interface is breaking.

I don't think we need to worry about this. Unlike vindexes, we don't expect custom topo server implementations.
The interface is exported because it has to be, but that doesn't mean that every exported interface is part of the "public" interface for the vitess codebase.

deepthi

Very nice PR with lots of good fixes.

Changes to Watch should be fine.
The quibble with WatchRecursive is that it is currently only implemented for etcd. It's good that you documented how it can be implemented for the other flavors. Since this is not yet being used in Vitess, we'll simply need to get the other flavors implemented whenever we want to start using WatchRecursive (for example in vtgate's healthcheck)
The one remaining question is regarding List. Currently it is a strict prefix check, so given /toplevel/nested/myfile, List("/top") matches and returns it versus assuming that top is a "directory". Looking at the comment on the interface, that seems intentional, so we should leave it as-is unless @mattlord thinks we should change it.

deepthi · 2022-08-02T22:25:31Z

go/vt/topo/k8stopo/file.go

 			results = append(results, topo.KVInfo{
 				Key:     []byte(node.Data.Key),
-				Value:   []byte(node.Data.Value),
+				Value:   out,
 				Version: KubernetesVersion(node.GetResourceVersion()),


not sure either. It was added by @carsonoid who may be able to tell us. or @derekperkins

deepthi · 2022-08-02T22:47:50Z

go/vt/topo/conn.go

+	Watch(ctx context.Context, filePath string) (current *WatchData, changes <-chan *WatchData, err error)
+


I don't think we need to worry about this. Unlike vindexes, we don't expect custom topo server implementations.
The interface is exported because it has to be, but that doesn't mean that every exported interface is part of the "public" interface for the vitess codebase.

deepthi · 2022-08-02T23:01:34Z

I believe the backup test failure has been fixed via #10895. Merging..

* Add List() implementation for memory topo This adds a List() implementation for the memory topo and adds a bunch of tests for this function as well. These tests are only run when the implementation indicates proper support for List. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Refactor Watch() to use context properly A function that can spawn a goroutine should use the passed in context for that and not return a cancel function itself. That is considered an anti-pattern, so let's change this all to more idiomatic Go patterns. This also ensures we can now set a proper timeout on the initial data retrieval for a watch using the configured topo timeout. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Add recursive watcher for topo This adds a new function to the topo interface which allows for recursive watching on the topo. Recursive watching can be used to significantly improve performance in a few cases where we want to monitor changes across a prefix. There's no immediate usage added yet, that will be done in follow up changes. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Don't allow expired contexts to retrieve topo connection Once a context is expired, we shouldn't return a valid cell anymore, even if it's the global cell. Local cells would fail in this case as well, so this ensures we match that behavior. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Add WaitForNewLeader to election API This allows for more efficiently waiting for a new leader to be elected. It also allows for a continued stream of changes during the election lifetime. This is very useful for example when something else needs to be triggered on an election change and we don't need to use a busy wait loop for it in that case. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Update entries also when receiving direct errors Also when we get an error directly from the watcher on the initial entry, we want to update the tracking value with that error. This is needed to ensure we handle errors like a file not existing and appearing later properly. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix broken k8stopo List implementation Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Add back still used code Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix import Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

Right now we pass in the context when starting a Watch that is also used for the request context. This means that the Watch ends up being cancelled when the original request that started it as a side effects ends up completing and cancels the context to clean up. This is of course not as intended. Before the refactor in vitessio#10906 this wasn't causing a practical issue yet. We'd still have the expired context internally in the watcher and it would be passed through with updating entries, but there were no calls that ended up validating the context expiry, avoiding any immediate issue. This is bound to fail though at some point if something would be added that does care about the context. What is needed is that the watcher we start sets up it's own context based on the background context since it is detached from the original request that might trigger starting the watcher as a side effect. Additionally, it means that the tracked context for an error isn't really useful. It would often be an already cancelled context from a mostly unrelated request which doesn't provide useful information. Even more so, it would keep a reference to that context so it would never be garbage collection potentially and would keep more request data alive than necessary. With the fix, the context is always from the background context with a cancel on top for that watcher. This isn't very useful either. Also we don't use this context tracking for any error messaging or reporting anywhere, so I believe it's better to clean up this tracking. By cleaning up that tracking, we also avoid the need to pass down the context in entry updates and that is all cleaned up here as well. Lastly, a failing test is introduced that verifies the original issue. It retrieves serving keyspace information, cancels the original request that triggered that and then validates the watcher is still running by updating the value again within the timeout window. This failed before this fix as the watcher would be cancelled and the cached old value was returned before the TTL expired. The main problem of this bug is not an issue of correctness, but of a serious performance degration in the vtgate. Each second we'd restart context setup if we ever had a failure on the path triggered by regular queries and the system would not recover from this situation and heavily query the topo server and make things very expensive. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Fix problematic watch cancellation due to context cancellation Right now we pass in the context when starting a Watch that is also used for the request context. This means that the Watch ends up being cancelled when the original request that started it as a side effects ends up completing and cancels the context to clean up. This is of course not as intended. Before the refactor in #10906 this wasn't causing a practical issue yet. We'd still have the expired context internally in the watcher and it would be passed through with updating entries, but there were no calls that ended up validating the context expiry, avoiding any immediate issue. This is bound to fail though at some point if something would be added that does care about the context. What is needed is that the watcher we start sets up it's own context based on the background context since it is detached from the original request that might trigger starting the watcher as a side effect. Additionally, it means that the tracked context for an error isn't really useful. It would often be an already cancelled context from a mostly unrelated request which doesn't provide useful information. Even more so, it would keep a reference to that context so it would never be garbage collection potentially and would keep more request data alive than necessary. With the fix, the context is always from the background context with a cancel on top for that watcher. This isn't very useful either. Also we don't use this context tracking for any error messaging or reporting anywhere, so I believe it's better to clean up this tracking. By cleaning up that tracking, we also avoid the need to pass down the context in entry updates and that is all cleaned up here as well. Lastly, a failing test is introduced that verifies the original issue. It retrieves serving keyspace information, cancels the original request that triggered that and then validates the watcher is still running by updating the value again within the timeout window. This failed before this fix as the watcher would be cancelled and the cached old value was returned before the TTL expired. The main problem of this bug is not an issue of correctness, but of a serious performance degration in the vtgate. Each second we'd restart context setup if we ever had a failure on the path triggered by regular queries and the system would not recover from this situation and heavily query the topo server and make things very expensive. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Improve handling of retries and timer wait The timer here can stay around if other events fire first, so we want to use an explicit timer to stop it immediately when we know it completes. Additionally, because of binding issues, watchCancel() would not rebind if we start a new inner watcher. Therefore this adds back an outer context that we can cancel in a defer to we know for sure we cancel things properly when stopping the watcher. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix leak in etc2topo tests We never closed the `cli` instance here so it would linger until the process completes. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Remove unused context Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* Revert "Add explicit close state to memory topo connection (vitessio#11110) (vitessio#1016)" This reverts commit eb1e9c2. * Revert "Fix races in memory topo and watcher (vitessio#11065) (vitessio#995)" This reverts commit 6bc0171. * Revert "Avoid race condition in watch shutdown (vitessio#10954) (vitessio#936)" This reverts commit 23d4e34. * Revert "Remove potential double close of channel (vitessio#10929) (vitessio#921)" This reverts commit 0121e5d. * Revert "Cherry pick topo improvements (vitessio#10906) (vitessio#916)" This reverts commit 8c9f56d.

dbussink added 5 commits August 1, 2022 16:14

Add List() implementation for memory topo

b239df9

This adds a List() implementation for the memory topo and adds a bunch of tests for this function as well. These tests are only run when the implementation indicates proper support for List. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink requested a review from vmg August 1, 2022 14:18

dbussink requested review from deepthi, shlomi-noach and rafael as code owners August 1, 2022 14:18

vmg approved these changes Aug 1, 2022

View reviewed changes

dbussink added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: TabletManager labels Aug 1, 2022

dbussink added 2 commits August 1, 2022 17:48

Fix broken k8stopo List implementation

d021712

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink commented Aug 2, 2022

View reviewed changes

dbussink requested a review from mattlord August 2, 2022 07:42

GuptaManan100 reviewed Aug 2, 2022

View reviewed changes

vmg added the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Aug 2, 2022

deepthi approved these changes Aug 2, 2022

View reviewed changes

deepthi removed the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Aug 2, 2022

deepthi merged commit 853d88d into vitessio:main Aug 2, 2022

deepthi deleted the dbussink/improve-topo branch August 2, 2022 23:01

This was referenced Aug 4, 2022

Remove potential double close of channel #10929

Merged

Avoid race condition in memory topo watch shutdown #10954

Merged

This was referenced Aug 22, 2022

Fix races in memory topo and watcher #11065

Merged

Add explicit close state to memory topo connection #11110

Merged

dbussink mentioned this pull request Sep 2, 2022

Fix problematic watch cancellation due to context cancellation #11170

Merged

3 tasks

dbussink mentioned this pull request Jun 12, 2023

Vitess 18: Remove deprecated k8stopo #13298

Closed

mattlord mentioned this pull request May 6, 2024

Etcd2Topo: Use node's ModRevision consistently for in-memory topo.Version value #15847

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve topo handling and add additional functionality #10906

Improve topo handling and add additional functionality #10906

dbussink commented Aug 1, 2022

vitess-bot bot commented Aug 1, 2022

vmg left a comment

dbussink Aug 2, 2022

mattlord Aug 2, 2022

deepthi Aug 2, 2022

GuptaManan100 left a comment

GuptaManan100 Aug 2, 2022

deepthi Aug 2, 2022

deepthi left a comment •

edited

Loading

deepthi Aug 2, 2022

deepthi Aug 2, 2022

deepthi commented Aug 2, 2022

		Watch(ctx context.Context, filePath string) (current WatchData, changes <-chan WatchData, err error)

Improve topo handling and add additional functionality #10906

Improve topo handling and add additional functionality #10906

Conversation

dbussink commented Aug 1, 2022

Checklist

vitess-bot bot commented Aug 1, 2022

Review Checklist

General

Bug fixes

Non-trivial changes

New/Existing features

Backward compatibility

vmg left a comment

Choose a reason for hiding this comment

dbussink Aug 2, 2022

Choose a reason for hiding this comment

mattlord Aug 2, 2022

Choose a reason for hiding this comment

deepthi Aug 2, 2022

Choose a reason for hiding this comment

GuptaManan100 left a comment

Choose a reason for hiding this comment

GuptaManan100 Aug 2, 2022

Choose a reason for hiding this comment

deepthi Aug 2, 2022

Choose a reason for hiding this comment

deepthi left a comment • edited Loading

Choose a reason for hiding this comment

deepthi Aug 2, 2022

Choose a reason for hiding this comment

deepthi Aug 2, 2022

Choose a reason for hiding this comment

deepthi commented Aug 2, 2022

deepthi left a comment •

edited

Loading