-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚠️ RSDK-7903 Cache machine status in robot client #4399
⚠️ RSDK-7903 Cache machine status in robot client #4399
Conversation
@@ -28,6 +28,7 @@ var exemptFromSession = map[string]bool{ | |||
"/proto.rpc.webrtc.v1.SignalingService/OptionalWebRTCConfig": true, | |||
"/proto.rpc.v1.AuthService/Authenticate": true, | |||
"/proto.rpc.v1.ExternalAuthService/AuthenticateTo": true, | |||
"/viam.robot.v1.RobotService/GetMachineStatus": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are now using MachineStatus
for inter-robot communication alongside ResourceNames
, we need to exempt it from session safety monitoring like we do for ResourceNames
ResourceNamesFunc: func() []resource.Name { return injectResources }, | ||
MachineStatusFunc: func(ctx context.Context) (robot.MachineStatus, error) { | ||
return rdktestutils.ResourcesToMachineStatus(injectResources), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bulk of test updates in this PR involve injecting MachineStatus
function to test robot clients, since this API replaces ResourceNames
as the means of sourcing resources
if name.API == client.RemoteAPI { | ||
continue | ||
} | ||
if name.API.Type.Namespace == resource.APINamespaceRDKInternal { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where I make the slight change to the MachineStatus
API that I note in the PR description:
MachineStatus changes slightly - it will no longer include RDK internal resources and RemoteAPIs (e.g. the resource that represents the remote itself). This mirrors the behavior of ResourceNames.
{ | ||
Name: resource.Name{ | ||
API: resource.APINamespaceRDKInternal.WithServiceType("framesystem"), | ||
Name: "builtin", | ||
}, | ||
State: resource.NodeStateReady, | ||
}, | ||
{ | ||
Name: resource.Name{ | ||
API: resource.APINamespaceRDKInternal.WithServiceType("cloud_connection"), | ||
Name: "builtin", | ||
}, | ||
State: resource.NodeStateReady, | ||
}, | ||
{ | ||
Name: resource.Name{ | ||
API: resource.APINamespaceRDKInternal.WithServiceType("packagemanager"), | ||
Name: "builtin", | ||
}, | ||
State: resource.NodeStateReady, | ||
}, | ||
{ | ||
Name: resource.Name{ | ||
API: resource.APINamespaceRDKInternal.WithServiceType("web"), | ||
Name: "builtin", | ||
}, | ||
State: resource.NodeStateReady, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing myself since I probably won't have time to review, but generally seems good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much LGTM! Just some questions.
@@ -71,7 +71,7 @@ type RobotClient struct { | |||
dialOptions []rpc.DialOption | |||
|
|||
mu sync.RWMutex | |||
resourceNames []resource.Name | |||
cachedMachineStatus robot.MachineStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be *robot.MachineStatus
to indicate a possible state where we haven't cached anything? Would mean you didn't have to return robot.MachineStatus{}
in some of your error cases and could just return nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be *robot.MachineStatus to indicate a possible state where we haven't cached anything?
agreed - it would be helpful to at least make the internal cachedMachineStatus
field a pointer to better express the case where nothing was cached yet.
however...
Would mean you didn't have to return robot.MachineStatus{} in some of your error cases and could just return nil.
the MachineStatus interface method specifies a struct return value instead of a struct pointer, so that method would still have to return MachineStatus{}
values. we can revisit this interface typing decision if folks are strongly against it. I wanted to avoid having to do worry about excessive nil
checking on MachineStatus
, but depending on how large that structure grows it might be useful to make it a pointer for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it a pointer for performance reasons
discussed offline with @dgottlieb - the resource.Status
objects, whose fields are most likely to expand in the short-term, are in slice and therefore stored on the heap and don't create extra work when copying.
there's no immediate plans to add any fields to the top-level struct - perhaps we can revisit this interface type if/when that changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair; thanks for the response here! Leaving as-is seems fine to me given your points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, as-is for both the internal cachedMachineStatus
field as well as the public method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel too strongly. I guess I noticed everything was robot.MachineStatus
and not *robot.MachineStatus
, and felt it to be odd. Given your reasoning for keeping the public method as robot.MachineStatus
, I think either value or pointer is fine for the cached value.
injectRobot.MachineStatusFunc = func(ctx context.Context) (robot.MachineStatus, error) { | ||
mu.Lock() | ||
defer mu.Unlock() | ||
callCountNames++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you could use a different variable here. Is it right to assume that the ResourceNamesFunc
above is not going to get called at all anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh - sorry missed this due to large-file folding. yes good call, will make separate counters for each and test accordingly just count machine status and remove the resource names stub altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ended up dropping a lot of useless code - thanks for catching!
@@ -620,22 +620,29 @@ func (rc *RobotClient) resources(ctx context.Context) ([]resource.Name, []resour | |||
defer cancel() | |||
} | |||
|
|||
resp, err := rc.client.ResourceNames(ctx, &pb.ResourceNamesRequest{}) | |||
resp, err := rc.machineStatus(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heads up @benjirewis @cheukt
spoke IRL with @dgottlieb - we can't move forward with this change, since it's backwards breaking. specifically, this will break comms between two machines if one is running a version has this this endpoint but the other is running a version that does not.
making this transition will require a different strategy to account for this risk, if we still decide to make it.
revert this PR to a draft until this issue is resolved: https://github.com/viamrobotics/rdk/pull/4399/files#r1783017466 |
superseded by #4421 |
Update robot client to source and cache resources from the
MachineStatus
API instead of theResourceNames
API.This change is mostly a no-op for public APIs:
ResourceNames
does not change - it will return the exact same resources.MachineStatus
changes slightly - it will no longer include RDK internal resources and RemoteAPIs (e.g. the resource that represents the remote itself). This mirrors the behavior ofResourceNames
.This change is a prerequisite for keeping remote resources in
ResourceNames
when they get disconnected - after making that change, we will wantMachineStatus
to show that those disconnected resources are in a non-Ready
state, which is why the robot client needs to cache more information than justResourceNames
.See this thread for details on our strategy break this change up.