-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚠️ RSDK-7903 Include unhealthy remotes in resource list with status #4273
base: main
Are you sure you want to change the base?
⚠️ RSDK-7903 Include unhealthy remotes in resource list with status #4273
Conversation
aeeba0d
to
36dbd45
Compare
73cae61
to
4153c0c
Compare
fb91e32
to
c30937d
Compare
c30937d
to
e2243e1
Compare
bbc5e4c
to
e2243e1
Compare
e2243e1
to
fd0784b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach to the changes in this file is to define a new state (e.g. NodeStateDisconnected
) - see the "Open questions" section in the PR description for more details and see the revert of this commit for a general idea of how this would look.
robot/client/client.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the high-level change here is that a robot client now caches the results of MachineStatus
rather than just ResourceNames
. As a result, multiple tests that result on injecting a mock function for ResourceNames
now need to mock MachineStatus
.
This reverts commit 34bc7f4.
|
||
// NodeStateDisconnected denotes a resource is disconnected. | ||
NodeStateDisconnected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to defining a new state would be to mark disconnected resource nodes as "unhealthy" with an appropriate error message. See "Open questions" in the description of this PR for more details and revert this commit to see how this alternative approach would look.
I hate to be that person, but is there any chance to split this PR up into 2 (or 3?) commits?
Or maybe 2 and 3 are the same and its only two commits. I feel confident reviewing "commit 1", but not useful for the MachineStatus changes. And I know commit 1 really just changes 1/2 files + test files. I'm also happy to step back if Cheuk/Benji are confident being the stakeholder for (1) |
This is a reasonable request, I was considering making the client cache change as it's own no-op PR anyway. |
|
@@ -361,6 +364,13 @@ func (w *GraphNode) SetNeedsUpdate() { | |||
w.setNeedsReconfigure(w.Config(), false, w.UnresolvedDependencies()) | |||
} | |||
|
|||
// SetDisconnected is used to mark a remote node as disconnected. | |||
func (w *GraphNode) SetDisconnected() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, in the resource graph we have:
- A node for each remote robot. Where the underlying resource is the robot client
- A node for each resource on a robot robot. Where the underlying resource is the component client (e.g: camera client).
Does this SetDisconnected
state only apply to nodes representing the remote robot client in the resource graph? Or does it also apply to the remote resource objects in the resource graph?
Second:
And when a remote goes from working -> disconnected -> working, what state transition will that take? Is it ready -> disconnected -> ready? Or do we jump through any of the unconfigured/configuring states?
I think the answers would work well in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this SetDisconnected state only apply to nodes representing the remote robot client in the resource graph? Or does it also apply to the remote resource objects in the resource graph?
it applies to the remote resource objects. also note that resources representing full remote machine clients are excluded from the MachineStatus endpoint.
And when a remote goes from working -> disconnected -> working, what state transition will that take? Is it ready -> disconnected -> ready? Or do we jump through any of the unconfigured/configuring states?
thanks for drawing attention to this. the flow I want is for disconnected nodes to go directly back into the ready state (unless they are unhealthy for some other reason). however, I don't think the current logic actually transitions disconnected nodes back to ready, so let me address that.
I think the answers would work well in the documentation.
will do
@@ -620,22 +621,29 @@ func (rc *RobotClient) resources(ctx context.Context) ([]resource.Name, []resour | |||
defer cancel() | |||
} | |||
|
|||
resp, err := rc.client.ResourceNames(ctx, &pb.ResourceNamesRequest{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, are you changing robots to use machine status instead of resource names to populate remote resources?
Summary
Keep remote resources in the
ResourceNames
andMachineStatus
APIs when their remote gets disconnected. In the latter API, mark such resources with a state indicate that they are disconnected.Prerequisites
To make reviews a bit easier, this PR is going to broken out into smaller chunks:
Details
Robot clients now fetch and cache resource statuses via the
GetMachineStatus
API instead of resource names viaResourceNames
API.Open questions
NodeStateDisconnected
), or is it sufficient to reuse the unhealthy state with a corresponding error (e.g.NodeStateUnhealthy[reason:disconnected]
)?MachineStatus
on the local or remote machine, respectively. Having a specialized state that can only appear for remote resources feels like the less confusing option in this case.Depends on