-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service interoperability with FastRTPS #184
Comments
The issue exists also in foxy (osrf/ros2:nightly docker image). |
@Stapelzeiger this is basically expected: see #8. There was (or I overlooked it) no clear specification how requests should be represented on the wire so did "something", which turns out to be prepending a header to the request/response content. Cyclone likely simply uses a shorter header. So the first step to find out what it is supposed to be. One possibility seems to be that it is |
I see, I wasn't aware that this wasn't standardized. It would be great to have different RMW either interoperate correctly or not at all (drop the message and maybe log an error). |
@Stapelzeiger, please don't take my word for what is standardized in ROS2 — that's something you can do for DDS, but not for ROS2. I also do agree with your sentiment, and in fact I'd go further and say that it ought to be interoperable. Perhaps @wjwwood or @ivanpauno could give a quick response on whether that is in the plans or not. For completeness: I quickly tried what I suggested a couple of days ago, but it doesn't solve the incompatibility. The reason is that Fast-RTPS is not prepending a request identifier to the request/response payload. Rather, it relies on information provided by the underlying transport (GUID and sequence number) to identify the request and it echos this in the response using a vendor-specific extension to the protocol. Making this interoperate is a tricky affair: the sequence number used by the reliable protocol is (correctly) not exposed to applications via the DDS API. Fast-RTPS happens to make it available in some way and the Fast-RTPS RMW layer then uses it — but this isn't portable to other DDS implementations. When a Fast-RTPS client sends a request to a Cyclone DDS service, typically the request will be dropped because lack of a request identifier means it can't be deserialized. With a Cyclone DDS client and a Fast-RTPS service, the prepended request identifier is interpreted as part of the payload, and depending on the request type, this may or may not lead to deserialisation failures (a string would probably cause a problem, an integer not). In the case of the minimal client/service example, the Fast-RTPS service interprets the request header from Cyclone as the numbers to add. The response is then typically discarded for the same reason a Cyclone-based service drops the requests from a Fast-RTPS client. Wireshark captured interactions: Cyclone DDS request (after my modification to serialize rmw_request_id which happens to make it match the GUID and sequence number of the writer; header indicated in the payload with
And a Fast-RTPS request
|
We do run interoperability tests between vendors in the |
The |
Hi. What is the status of this problem? We have had some major issues related to service communication in galactic that turns out to be related to this interoperability issue. If the default rmw implementation for ros will change between releases, i think they must be interoperable to make migration easier. |
In short, there hasn't been any movement here. It would be great to have the various DDS implementations interoperable, but this would take a significant investment of time and effort to make it happen. It would start with a design document describing how services have to be implemented in ROS 2 to be RMW compatible, and then follow-on with changing the implementation of all three of the in-core DDS vendors (Fast DDS, Cyclone DDS, and RTI Connext) to conform to that design. |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-cross-distribution-communication/27335/2 |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-cross-distribution-communication/27335/5 |
This is one of those issues I've talked to quite some users that basically reasoned like this (whether it's all true is another discussion):
the conclusion is certainly not true, partly due to the problem described in this issue.
@clalancette: would you agree this is not something that can be expected to be led and coordinated by just any community member? Having ran into this myself I have an interest in seeing this resolved, but as an outsider it isn't immediately apparent to me how to get started here. |
Sure, but that isn't an interesting question. It's true of many things. The question is: who is going to step up to lead and coordinate it? We can provide some pointers, but this is not currently on the roadmap for things that the ROS 2 core development team is going to look at for Iron. |
I believe it is, as this implies it's likely not going to get picked up by random contributors. That significantly reduces the chances of it getting picked up at all, which seems to be sort-of confirmed by the fact this issue is already 2 years old.
My main concern is that this is such a fundamental change I don't expect this to land easily without buy-in from that core development team. The content-based subscription implementation also took quite some time and that was instigated and implemented by a TSC member. |
There might be other scenarios being considered here, but I am specifically concerned about the one @kristoferB mentioned about transition from galactic to another ROS distribution (rolling or humble) and that in that case interoperability should be improved to ease transition. That would certainly be nice, but I want to make it clear, there is no guarantee of wire compatibility between two distributions of ROS 2. There can and will be breaks in that. There are no guarantees that two distributions of ROS will communicate at all. Changing the default rmw (and the two rmw's not being compatible on the wire) is just one of many ways things might break between ROS distributions. If you ever use two different ROS distributions together and they work then that's just a happy coincidence in my opinion. @clalancette or others may disagree (please correct me if I'm wrong), but from my perspective the expectation is wrong from the get go. Now there may be valid scenarios where you're using the same ROS distribution but for one reason or another you cannot control which rmw is being used in different parts of the system, and if they're both DDS based you could reasonably assume they should talk to each other. It's something that's worth working on, but until now has been pretty rare in my, admittedly anecdotal, experience. Also, breaks between ROS distributions should be for reasons that are intentional, so if all things would be equal otherwise and we can maintain backwards compatibility between ROS distributions we should certainly do that, but it doesn't override other needs or feature development in my opinion. |
For me at least this is not about compatibility between ROS versions. This is about not being able to use different DDS-based RMWs for different (sub)sets of nodes all "in" the same ROS 2 version. I understand all the caveats about cross-ROS2-distribution communication, but for the underlying problem (ie: serialisation of payloads has not been (sufficiently) standardised) they don't necessarily seem to be a concern. And besides this use-case, basing ROS 2 (largely) on DDS but then not benefiting from the extensive cross-vendor compatibility support it provides (I'm aware of the caveats here) seems like a lost opportunity. |
I also agree with @gavanderhoorn that this is a big problem for deploying ROS in a larger setup. I also think that it is ok that we do not have compatibility between ROS versions, but then the limitations should be well known. It should be possible to bridge the traffic if necessary. We have a large system running on many computers and can not change everything at once. But for us it has been working to use Galactic and Humble together as long as we are using the same RMW. So that the different RMWs does not have a standard way for services to communicate has been the main challenge for us. This is a problem since the default RMW changes between each release. If that will be the case in the future, i think it is necessary that the default functionalities (like services and actions) in the RMWs are compatible. |
I'd like to point out that having interoperability between RMWs is somehow anecdotal right now, and only happens because all the TIER 1 RMWs are DDS based. Setting the requirement for them to be interoperable at all times means that all of them must have wire (and maybe SHM) compatibility, which is a very hard requirement for RMW vendors in general. |
I'd say this issue, as it's been posted on a DDS-based RMW's tracker, is specifically about compatibility between DDS-based RMWs. I don't believe there would be interest in standardising payloads across all possible RMWs. That would also not seem possible, as it would essentially reduce the role of any particular RMW (or its underlying communication infrastructure) to providing a conduit for an opaque blob of bytes, which RMW vendors likely would not be interested in either. Edit: and also anecdotal, but I've yet to encounter a deployment of ROS 2 in a production setting using a non-DDS-based RMW. They do exist (eCal comes to mind), but distributed applications tend to use DDS. |
I think this is great discussion, I am really interested in this to expand my imagination. As user, I would think,
Thinking about the use cases for far edge devices, this discussion goes to interoperability with |
Another point I'll make is that it is not even clear that it is possible to make cross-vendor services compatible. The issue is that the core DDS standard does not specify a service-like functionality[1], and leaves it up to the vendors. For Fast-DDS and CycloneDDS, we actually implement this in a similar way via the rmw layer, but Connext, for instance, does it completely differently (I don't know what GurumDDS does). So even if we were able to make Fast-DDS and CycloneDDS talk via services, we could not guarantee compatibility with other vendors. [1] There is an extension to the DDS standard that specifies services, but as of today none of our core DDS implementations (Fast-DDS, CycloneDDS, Connext) implement it. |
Is that true? I was pretty sure both Connext and Fast-DDS implemented the dds-rpc spec. Is that what you're talking about? That spec allows for two different ways to implement it, and those two modes are not compatible. Also, unless something has changed services between fast-dds and connext use to work. I'm not sure how services are implemented with cyclone dds however. |
A quick google seems to indicate at least eProsima and RTI support the RPC spec. Edit: this response by @eboasson on ROS Discourse seems to imply he isn't a big fan of it, so I'm guessing Cyclone doesn't / won't support it:
Edit 2: I don't believe there is a need necessarily to implement DDS-RPC. Standardising (de)serialisation of the service payloads should already help address the main issue. There appear to be two options:
which one would be more efficient use of everyone's time I wouldn't know. Edit 3: @fujitatomoya wrote:
that's indeed what users I've talked to expect. It isn't true right now though of course.
If service payload (de)serialisation gets standardised, I would expect |
I guess I could be wrong. I will say that we've had all of the inter-vendor service compatibility tests disabled for years: https://github.com/ros2/system_tests/blob/e482f5d4f44a64c4d3470903a60bb1ff229eec8a/test_communication/CMakeLists.txt#L276-L284 |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/supported-dds-implementations/29180/2 |
Bug report
Required Info:
Steps to reproduce issue
machine A:
machine B:
Expected behavior
the example works as usual:
service printing: request: 41 + 1
client printing: Result of add_two_ints: for 41 + 1 = 42
Actual behavior
service prints: request: 3249716421018666984 + 1 (numbers change between runs)
client doesn't return
Additional information
The text was updated successfully, but these errors were encountered: