-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moves sockets into the advanced worker #7760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shell/plugins/steve/subscribe.js
- some docs/comments at the top is needed to explain that subscribe handles resource socket things directly with sockets, or via worker (basic and advanced)
'ws.resource.change'(ctx, msg)
contains an if block that has been removed in master - Allow for partial count updates via websocket #7647 (which removed special handling of COUNTS given changes in the backend). This PR needs aligning with that one'ws.resource.change'(ctx, msg)
&'ws.resource.remove'(ctx, msg)
have additional code to handle resource aliases which is skipped if the advanced worker is used- It should be a little clearer which fns apply to which mode (socket, basic worker, advanced worker). I think this pretty much means the following t
- Affect following types - rehydrateSubscribe, reconnectWatches, opened, closed, error, send, sendImmediate,
ws.resource.stop
,ws.resource.start
,ws.resource.error
- Could do this by doc, or throw exception if mode is advanced worker?
- Affect following types - rehydrateSubscribe, reconnectWatches, opened, closed, error, send, sendImmediate,
- To confirm, we don't need to do anything special for resyncWatch?
- How is
ws.ping
handled in the advanced worker? (update - doesn't look like this is passed through from advanced worker)
- I'm still not sure I understand why
resourceWatcher
needs to update state on an interval instead of directly when needed. it introduces a lot of complexity (and delay) and the benefit isn't clear. - I've only given this a brief test but hit two issue.
- issue 1
- In browser 1 bring up the Deployments list
- In browser 2 go to same list, edit a deployment and scale it up
- In browser 1 the deployment stats should change, however remain with the previous value
- issue 2
- switching between clusters should close and open the socket
- (probably due to unsubscribe not being wired in)
- issue 1
My TODO's
- All advanced worker testing
- Test normal socket behaviour plus basic worker
- Fixes for switching cluster - includes using common getPerformanceSetting - avoid new code to unsub before socket disconnect - handle `watch` `stop` requests - lots of TODO's (questions, work, checks, test, etc) - use common
- isAdvancedWorker should only be true for cluster store - advancedWorker to be wired in
- sockets use an incremented local var for id - when we nuke the socket file within the worker this resets, so they all ahve id of 1 - work around this by applying the unix time
- seen in dex cluster explorer dashboard - count cards would be removed when partial counts response received
- getters canWatch, watchStarted now are worked around (they look at state in the UI thread) - we now don't call resource.stop or restart.start in subscription - tidied up `forgetType` - moved clearFromQueue from steve mutations into subscription mutations (better location) - added and removed some TODOs - fixed watch (stop handler should be higher up, include force watch handling)
- This change mutates input in a function, which is bad... - but ensures the reference isn't broken, which is needed to maintain similar functionality as before
- Seen when creating or viewing clusters
- the probably would have been a problem if the worker wasn't nuked - however as the codes there lets make it safe Also added `trace` feature in advanced worker, will probably bring out to other places as well
- Ensure that we handle the case where the advanced worker was created but the resource watcher wasn't - ... but fix case where this was happening (aka ensure that a blank cluster context is ignored)
- This will help test normal flow (when advanced worker is disabled) - Note - setting is now in a bag. This may help us better support further settings (enable client side pagination, etc) ``` advancedWorker: { enabled: false }, ```
…oard and events not re-subbed - Ensure we block default handling of resource.start (keep state in resource watcher)
…ew file - this avoids bringing class files into the worker
- Remove `syncWatch` (do the watch/unwatch straight away) - Test/Fix re-sub on reconnect - Test/Fix growls on disconnect
- including clean of workerQueue on resource.stop (this is SUPER defensive)
- ensure podsByNamespace is updated on batchChange TODO - the final update to the pod is ignored - removing a namespace cleans the cache correctly - disabling advanced worker still works
- ensure podsByNamespace is updated on batchChange Tested / Fixed - the final update to the pod is ignored - removing a namespace cleans the cache correctly - disabling advanced worker still works
Summary of Dev TestingTo DoDisconnect/Reconnect
CompleteStress Testing
Flood Testing
Resources update
Batch updates work as expected
Settings/Old Way
Specific types of events
Disconnect/Reconnect
Regression
Unit tests
|
- batchChanges fixes - fix index is 0 issues (!/!!index) - only `set` if we have to - ensure we set the correct index after pushing to list - ensure map is updated after reducing list size with limit - podsByNamespace fixes - ensure when ew replace... we don't use the same referenced object - general service resource fixes - ensure service's pods list stays up to date with store
- resourceCache - store the hash instead of the whole object. This means longer load time be reduces memory footprint - resourceWatcher - don't re-sub on socket reconnect if watcher is in error - don't sub if watcher is in error - don't unwatch for 'failed to find schema' and 'too old' errors - this clears the error, we won't to keep it to ensure we don't watch - Remove #5997 comments, follow on work #7917
Much more scope for some crazy content
- disable logging by default - initWorker comes in too late to affect initial trace, so just rely on the `debug` to toggle at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the below is done this is good to squash merge
- Remaining tests from Moves sockets into the advanced worker #7760 (comment)
- @Sean-McQ Can you give the code a brief self review before merging, just to catch any bad logging, settings, etc?
Work for next release in #7917
This reverts commit ca1b810.
Hello @richard-cox, I've been checking the code in the advanced worker and noticed that there seems to be a missing piece for the COUNT entity (
I suppose this works fine for standard resources, but not for counts. We need to merge previously saved counts; otherwise, only the last counts will be included in the batch changes to send. (Perhaps something similar to this could work: #7647) |
Summary
Fixes #7894
Occurred changes and/or fixed issues
Technical notes summary
Areas or cases that should be tested
Areas which could experience regressions
Screenshot/Video