Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service interoperability with FastRTPS #184

Open
Stapelzeiger opened this issue May 12, 2020 · 24 comments
Open

Service interoperability with FastRTPS #184

Stapelzeiger opened this issue May 12, 2020 · 24 comments
Labels

Comments

@Stapelzeiger
Copy link

Bug report

Required Info:

  • Operating System:
    • macOS 10.14
    • Ubuntu 18.04 (32bit ARM)
  • Installation type:
    • binary
  • Version or commit hash:
    • Eloquent patch 1
  • DDS implementation:
    • rmw_cyclonedds_cpp
    • rmw_fastrtps_cpp
  • Client library (if applicable):
    • rclcpp & rclpy

Steps to reproduce issue

machine A:

export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
ros2 run examples_rclcpp_minimal_client client_main

machine B:

export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
ros2 run examples_rclcpp_minimal_service service_main

Expected behavior

the example works as usual:
service printing: request: 41 + 1
client printing: Result of add_two_ints: for 41 + 1 = 42

Actual behavior

service prints: request: 3249716421018666984 + 1 (numbers change between runs)
client doesn't return

Additional information

  • cyclonedds<->cyclonedds service calls work
  • freertps<-> freertps service calls work
  • I tested on macOS locally, Ubuntu (32bit ARM) locally, and macOS to Ubuntu over wifi, results are consistent
  • publish/subscribe works without issues
  • rclpy versions of the examples have the same behavior
@Stapelzeiger Stapelzeiger changed the title Service interoperability with fastrtps Service interoperability with FastRTPS May 12, 2020
@Stapelzeiger
Copy link
Author

The issue exists also in foxy (osrf/ros2:nightly docker image).

@eboasson
Copy link
Collaborator

@Stapelzeiger this is basically expected: see #8. There was (or I overlooked it) no clear specification how requests should be represented on the wire so did "something", which turns out to be prepending a header to the request/response content. Cyclone likely simply uses a shorter header.

So the first step to find out what it is supposed to be. One possibility seems to be that it is rmw_requestid_t. If so, I think making it interoperate doesn't require much more work than replacing cdds_request_header_t by rmw_request_id_t everywhere and making some local adjustments. I suppose one might then also want to put a "real" GUID in to be able to correlate it with the information stored in the graph cache, and that's available in get_entity_gid. So it should be pretty easy.

@Stapelzeiger
Copy link
Author

I see, I wasn't aware that this wasn't standardized.

It would be great to have different RMW either interoperate correctly or not at all (drop the message and maybe log an error).
The current behavior of silently returning corrupt data to the application is quite dangerous in my opinion. Is there something that can be done to detect data from a different middleware and drop it?

@eboasson
Copy link
Collaborator

@Stapelzeiger, please don't take my word for what is standardized in ROS2 — that's something you can do for DDS, but not for ROS2. I also do agree with your sentiment, and in fact I'd go further and say that it ought to be interoperable. Perhaps @wjwwood or @ivanpauno could give a quick response on whether that is in the plans or not.

For completeness: I quickly tried what I suggested a couple of days ago, but it doesn't solve the incompatibility. The reason is that Fast-RTPS is not prepending a request identifier to the request/response payload. Rather, it relies on information provided by the underlying transport (GUID and sequence number) to identify the request and it echos this in the response using a vendor-specific extension to the protocol.

Making this interoperate is a tricky affair: the sequence number used by the reliable protocol is (correctly) not exposed to applications via the DDS API. Fast-RTPS happens to make it available in some way and the Fast-RTPS RMW layer then uses it — but this isn't portable to other DDS implementations.

When a Fast-RTPS client sends a request to a Cyclone DDS service, typically the request will be dropped because lack of a request identifier means it can't be deserialized. With a Cyclone DDS client and a Fast-RTPS service, the prepended request identifier is interpreted as part of the payload, and depending on the request type, this may or may not lead to deserialisation failures (a string would probably cause a problem, an integer not). In the case of the minimal client/service example, the Fast-RTPS service interprets the request header from Cyclone as the numbers to add. The response is then typically discarded for the same reason a Cyclone-based service drops the requests from a Fast-RTPS client.

Wireshark captured interactions:

Cyclone DDS request (after my modification to serialize rmw_request_id which happens to make it match the GUID and sequence number of the writer; header indicated in the payload with [...:...])

Frame 93: 164 bytes on wire (1312 bits), 164 bytes captured (1312 bits) on interface 0
Null/Loopback
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 192.168.2.3
User Datagram Protocol, Src Port: 55915, Dst Port: 7413
Real-Time Publish-Subscribe Wire Protocol
    Magic: RTPS
    Protocol version: 2.1
    vendorId: 01.16 (Unknown)
    guidPrefix: d3531001e9f879f8255da183
    Default port mapping: domainId=0, participantIdx=1, nature=UNICAST_USERTRAFFIC
    submessageId: INFO_TS (0x09)
        Flags: 0x01, Endianness bit
        octetsToNextHeader: 8
        Timestamp: May 19, 2020 08:01:19.919455000 UTC
    submessageId: DATA (0x15)
        Flags: 0x05, Data present, Endianness bit
        octetsToNextHeader: 64
        0000 0000 0000 0000 = Extra flags: 0x0000
        Octets to inline QoS: 16
        readerEntityId: ENTITYID_UNKNOWN (0x00000000)
        writerEntityId: 0x00001503 (Application-defined writer (no key): 0x000015)
        writerSeqNumber: 1
        serializedData
            encapsulation kind: CDR_LE (0x0001)
            encapsulation options: 0x0000
            serializedData: d3531001e9f879f8255da183000015030100000000000000…
    submessageId: HEARTBEAT (0x07)
        Flags: 0x01, Endianness bit
        octetsToNextHeader: 28
        readerEntityId: ENTITYID_UNKNOWN (0x00000000)
        writerEntityId: 0x00001503 (Application-defined writer (no key): 0x000015)
        firstAvailableSeqNumber: 1
        lastSeqNumber: 1
        count: 1

0000   02 00 00 00 45 00 00 a0 ae 52 00 00 40 11 00 00   ....E....R..@...
0010   7f 00 00 01 c0 a8 02 03 da 6b 1c f5 00 8c 42 4a   .........k....BJ
0020   52 54 50 53 02 01 01 10 d3 53 10 01 e9 f8 79 f8   RTPS.....S....y.
0030   25 5d a1 83 09 01 08 00 4f 92 c3 5e 24 67 61 eb   %]......O..^$ga.
0040   15 05 40 00 00 00 10 00 00 00 00 00 00 00 15 03   ..@.............
0050   00 00 00 00 01 00 00 00 00 01 00 00[d3 53 10 01   .............S..
0060   e9 f8 79 f8 25 5d a1 83 00 00 15 03:01 00 00 00   ..y.%]..........
0070   00 00 00 00]29 00 00 00 00 00 00 00 01 00 00 00   ....)...........
0080   00 00 00 00 07 01 1c 00 00 00 00 00 00 00 15 03   ................
0090   00 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00   ................
00a0   01 00 00 00                                       ....

Fast-RTPS response

Frame 94: 148 bytes on wire (1184 bits), 148 bytes captured (1184 bits) on interface 0
Null/Loopback
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
User Datagram Protocol, Src Port: 56320, Dst Port: 56430
Real-Time Publish-Subscribe Wire Protocol
    Magic: RTPS
    Protocol version: 2.3
    vendorId: 01.15 (eProsima - Fast-RTPS)
    guidPrefix: 010ff71ba6b7000001000000
    Default port mapping: domainId=196, participantIdx=10, nature=UNICAST_METATRAFFIC
    submessageId: INFO_DST (0x0e)
        Flags: 0x01, Endianness bit
        octetsToNextHeader: 12
        guidPrefix: d3531001e9f879f8255da183
    submessageId: INFO_TS (0x09)
        Flags: 0x01, Endianness bit
        octetsToNextHeader: 8
        Timestamp: May 19, 2020 08:01:19.920803070 UTC
    submessageId: DATA (0x15)
        Flags: 0x07, Data present, Inline QoS, Endianness bit
        octetsToNextHeader: 64
        0000 0000 0000 0000 = Extra flags: 0x0000
        Octets to inline QoS: 16
        readerEntityId: 0x00001604 (Application-defined reader (no key): 0x000016)
        writerEntityId: 0x00001303 (Application-defined writer (no key): 0x000013)
        writerSeqNumber: 1
        inlineQos:
            Unknown (0x800f) <-- vendor-specific extension
                parameterId: Unknown (0x800f)
                parameterLength: 24
                parameterData: d3531001e9f879f8255da183000015030000000001000000
            PID_SENTINEL
                parameterId: PID_SENTINEL (0x0001)
        serializedData
            encapsulation kind: CDR_LE (0x0001)
            encapsulation options: 0x0000
            serializedData: f8b0b184e9f88efb

0000   02 00 00 00 45 00 00 90 a4 fb 00 00 40 11 00 00   ....E.......@...
0010   7f 00 00 01 7f 00 00 01 dc 00 dc 6e 00 7c fe 8f   ...........n.|..
0020   52 54 50 53 02 03 01 0f 01 0f f7 1b a6 b7 00 00   RTPS............
0030   01 00 00 00 0e 01 0c 00 d3 53 10 01 e9 f8 79 f8   .........S....y.
0040   25 5d a1 83 09 01 08 00 4f 92 c3 5e 00 c0 b9 eb   %]......O..^....
0050   15 07 40 00 00 00 10 00 00 00 16 04 00 00 13 03   ..@.............
0060   00 00 00 00 01 00 00 00 0f 80 18 00 d3 53 10 01   .............S..
0070   e9 f8 79 f8 25 5d a1 83 00 00 15 03 00 00 00 00   ..y.%]..........
0080   01 00 00 00 01 00 00 00 00 01 00 00 f8 b0 b1 84   ................
0090   e9 f8 8e fb                                       ....

And a Fast-RTPS request

Frame 93: 124 bytes on wire (992 bits), 124 bytes captured (992 bits) on interface 0
Null/Loopback
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
User Datagram Protocol, Src Port: 50375, Dst Port: 60556
Real-Time Publish-Subscribe Wire Protocol
    Magic: RTPS
    Protocol version: 2.3
    vendorId: 01.15 (eProsima - Fast-RTPS)
    guidPrefix: 010ff71bb1bb000001000000
    Default port mapping: domainId=212, participantIdx=73, nature=UNICAST_METATRAFFIC
    submessageId: INFO_DST (0x0e)
        Flags: 0x01, Endianness bit
        octetsToNextHeader: 12
        guidPrefix: 9cf81001066604e3f7dc2420
    submessageId: INFO_TS (0x09)
        Flags: 0x01, Endianness bit
        octetsToNextHeader: 8
        Timestamp: May 19, 2020 08:20:30.607256174 UTC
    submessageId: DATA (0x15)
        Flags: 0x05, Data present, Endianness bit
        octetsToNextHeader: 40
        0000 0000 0000 0000 = Extra flags: 0x0000
        Octets to inline QoS: 16
        readerEntityId: 0x00001604 (Application-defined reader (no key): 0x000016)
        writerEntityId: 0x00001303 (Application-defined writer (no key): 0x000013)
        writerSeqNumber: 1
        serializedData
            encapsulation kind: CDR_LE (0x0001)
            encapsulation options: 0x0000
            serializedData: 29000000000000000100000000000000

0000   02 00 00 00 45 00 00 78 97 bf 00 00 40 11 00 00   ....E..x....@...
0010   7f 00 00 01 7f 00 00 01 c4 c7 ec 8c 00 64 fe 77   .............d.w
0020   52 54 50 53 02 03 01 0f 01 0f f7 1b b1 bb 00 00   RTPS............
0030   01 00 00 00 0e 01 0c 00 9c f8 10 01 06 66 04 e3   .............f..
0040   f7 dc 24 20 09 01 08 00 ce 96 c3 5e 00 24 75 9b   ..$ .......^.$u.
0050   15 05 28 00 00 00 10 00 00 00 16 04 00 00 13 03   ..(.............
0060   00 00 00 00 01 00 00 00 00 01 00 00 29 00 00 00   ............)...
0070   00 00 00 00 01 00 00 00 00 00 00 00               ............

@clalancette
Copy link
Contributor

@Stapelzeiger, please don't take my word for what is standardized in ROS2 — that's something you can do for DDS, but not for ROS2. I also do agree with your sentiment, and in fact I'd go further and say that it ought to be interoperable. Perhaps @wjwwood or @ivanpauno could give a quick response on whether that is in the plans or not.

We do run interoperability tests between vendors in the test_communication package. These tests are currently only for pub/sub, and have some other limitations, but it is a use-case we'd like to have working. That being said, we haven't heard a huge amount of user demand for it, so it hasn't been our highest priority either.

@dirk-thomas
Copy link
Member

We do run interoperability tests between vendors in the test_communication package. These tests are currently only for pub/sub, and have some other limitations, but it is a use-case we'd like to have working.

The test_communication does include service tests: https://github.com/ros2/system_tests/blob/b9bebaf21fd17f45bb29f3af9b619481d2d886b9/test_communication/CMakeLists.txt#L265-L315 Various RMW combination are explicitly though atm.

@kristoferB
Copy link

Hi. What is the status of this problem? We have had some major issues related to service communication in galactic that turns out to be related to this interoperability issue. If the default rmw implementation for ros will change between releases, i think they must be interoperable to make migration easier.

@clalancette
Copy link
Contributor

Hi. What is the status of this problem?

In short, there hasn't been any movement here. It would be great to have the various DDS implementations interoperable, but this would take a significant investment of time and effort to make it happen. It would start with a design document describing how services have to be implemented in ROS 2 to be RMW compatible, and then follow-on with changing the implementation of all three of the in-core DDS vendors (Fast DDS, Cyclone DDS, and RTI Connext) to conform to that design.

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-cross-distribution-communication/27335/2

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-cross-distribution-communication/27335/5

@gavanderhoorn
Copy link

gavanderhoorn commented Oct 4, 2022

This is one of those issues many users encounter, but then quickly work-around by "simply" using the same RMW everywhere -- everywhere they can. But there are plenty of cases where the RMW used is not completely under control of that user and then things break down.

I've talked to quite some users that basically reasoned like this (whether it's all true is another discussion):

  1. ROS 2 uses DDS
  2. DDS is standardised
  3. DDS vendors invest in, check and guarantee cross-vendor compatibility

Ergo: ROS 2 RMWs should/are/must be compatible as well.

the conclusion is certainly not true, partly due to the problem described in this issue.

It would start with a design document describing how services have to be implemented in ROS 2 to be RMW compatible, and then follow-on with changing the implementation of all three of the in-core DDS vendors (Fast DDS, Cyclone DDS, and RTI Connext) to conform to that design.

@clalancette: would you agree this is not something that can be expected to be led and coordinated by just any community member?

Having ran into this myself I have an interest in seeing this resolved, but as an outsider it isn't immediately apparent to me how to get started here.

@clalancette
Copy link
Contributor

@clalancette: would you agree this is not something that can be expected to be led and coordinated by just any community member?

Sure, but that isn't an interesting question. It's true of many things. The question is: who is going to step up to lead and coordinate it? We can provide some pointers, but this is not currently on the roadmap for things that the ROS 2 core development team is going to look at for Iron.

@gavanderhoorn
Copy link

Sure, but that isn't an interesting question

I believe it is, as this implies it's likely not going to get picked up by random contributors.

That significantly reduces the chances of it getting picked up at all, which seems to be sort-of confirmed by the fact this issue is already 2 years old.

this is not currently on the roadmap for things that the ROS 2 core development team [..]

My main concern is that this is such a fundamental change I don't expect this to land easily without buy-in from that core development team. The content-based subscription implementation also took quite some time and that was instigated and implemented by a TSC member.

@wjwwood
Copy link
Member

wjwwood commented Oct 4, 2022

There might be other scenarios being considered here, but I am specifically concerned about the one @kristoferB mentioned about transition from galactic to another ROS distribution (rolling or humble) and that in that case interoperability should be improved to ease transition.

That would certainly be nice, but I want to make it clear, there is no guarantee of wire compatibility between two distributions of ROS 2. There can and will be breaks in that. There are no guarantees that two distributions of ROS will communicate at all. Changing the default rmw (and the two rmw's not being compatible on the wire) is just one of many ways things might break between ROS distributions. If you ever use two different ROS distributions together and they work then that's just a happy coincidence in my opinion. @clalancette or others may disagree (please correct me if I'm wrong), but from my perspective the expectation is wrong from the get go.

Now there may be valid scenarios where you're using the same ROS distribution but for one reason or another you cannot control which rmw is being used in different parts of the system, and if they're both DDS based you could reasonably assume they should talk to each other. It's something that's worth working on, but until now has been pretty rare in my, admittedly anecdotal, experience.

Also, breaks between ROS distributions should be for reasons that are intentional, so if all things would be equal otherwise and we can maintain backwards compatibility between ROS distributions we should certainly do that, but it doesn't override other needs or feature development in my opinion.

@gavanderhoorn
Copy link

For me at least this is not about compatibility between ROS versions. This is about not being able to use different DDS-based RMWs for different (sub)sets of nodes all "in" the same ROS 2 version.

I understand all the caveats about cross-ROS2-distribution communication, but for the underlying problem (ie: serialisation of payloads has not been (sufficiently) standardised) they don't necessarily seem to be a concern.

And besides this use-case, basing ROS 2 (largely) on DDS but then not benefiting from the extensive cross-vendor compatibility support it provides (I'm aware of the caveats here) seems like a lost opportunity.

@kristoferB
Copy link

I also agree with @gavanderhoorn that this is a big problem for deploying ROS in a larger setup. I also think that it is ok that we do not have compatibility between ROS versions, but then the limitations should be well known. It should be possible to bridge the traffic if necessary. We have a large system running on many computers and can not change everything at once. But for us it has been working to use Galactic and Humble together as long as we are using the same RMW. So that the different RMWs does not have a standard way for services to communicate has been the main challenge for us. This is a problem since the default RMW changes between each release. If that will be the case in the future, i think it is necessary that the default functionalities (like services and actions) in the RMWs are compatible.

@EduPonz
Copy link

EduPonz commented Oct 5, 2022

I'd like to point out that having interoperability between RMWs is somehow anecdotal right now, and only happens because all the TIER 1 RMWs are DDS based. Setting the requirement for them to be interoperable at all times means that all of them must have wire (and maybe SHM) compatibility, which is a very hard requirement for RMW vendors in general.

@gavanderhoorn
Copy link

gavanderhoorn commented Oct 5, 2022

I'd say this issue, as it's been posted on a DDS-based RMW's tracker, is specifically about compatibility between DDS-based RMWs.

I don't believe there would be interest in standardising payloads across all possible RMWs. That would also not seem possible, as it would essentially reduce the role of any particular RMW (or its underlying communication infrastructure) to providing a conduit for an opaque blob of bytes, which RMW vendors likely would not be interested in either.


Edit: and also anecdotal, but I've yet to encounter a deployment of ROS 2 in a production setting using a non-DDS-based RMW. They do exist (eCal comes to mind), but distributed applications tend to use DDS.

@fujitatomoya
Copy link
Contributor

I think this is great discussion, I am really interested in this to expand my imagination.

As user, I would think,

  • cross-distribution compatibility cannot be supported or guaranteed with current policy, cz API might be broken over ROS version. I think we can avoid this to use container / sandbox even if BSP system does not support the ROS version we are looking for.
  • cross-vendor rmw interoperability.
    • I would expect that DDS-based rmw implementation is interoperable between them. They are using the same protocol underneath and why it loses the DDS compatibility over ROS 2? (like rmw interoperability group)
    • Implementation specific feature such as zero copy, should be enabled between the same vendor implementation? (This should be concealed for user application to provide out-of-the-box user experience?)
    • particular rmw implementation which does not have any interoperability with anything else can also exist. This is also fine, we have developed shared-memory only rmw implementation internally for specific use case.

Thinking about the use cases for far edge devices, this discussion goes to interoperability with micro-ROS as well?

@clalancette
Copy link
Contributor

Another point I'll make is that it is not even clear that it is possible to make cross-vendor services compatible.

The issue is that the core DDS standard does not specify a service-like functionality[1], and leaves it up to the vendors. For Fast-DDS and CycloneDDS, we actually implement this in a similar way via the rmw layer, but Connext, for instance, does it completely differently (I don't know what GurumDDS does). So even if we were able to make Fast-DDS and CycloneDDS talk via services, we could not guarantee compatibility with other vendors.

[1] There is an extension to the DDS standard that specifies services, but as of today none of our core DDS implementations (Fast-DDS, CycloneDDS, Connext) implement it.

@wjwwood
Copy link
Member

wjwwood commented Oct 5, 2022

Is that true? I was pretty sure both Connext and Fast-DDS implemented the dds-rpc spec. Is that what you're talking about?

That spec allows for two different ways to implement it, and those two modes are not compatible. Also, unless something has changed services between fast-dds and connext use to work. I'm not sure how services are implemented with cyclone dds however.

@gavanderhoorn
Copy link

gavanderhoorn commented Oct 5, 2022

A quick google seems to indicate at least eProsima and RTI support the RPC spec.


Edit: this response by @eboasson on ROS Discourse seems to imply he isn't a big fan of it, so I'm guessing Cyclone doesn't / won't support it:

Any plans to support the DDS-RPC spec?

Hmm … not really. It is an abomination that takes a pub/sub-based shared data space and then abuses it to do point-to-point request-reply, and in typical DDS specification-style it also implies a lot of work. There are more valuable things to be done, I’d say. In my opinion, it makes far more sense to use DDS to publish server locators, then use a standard RPC mechanism to invoke those services.

But if someone were to contribute an implementation (it’s mostly preprocessing anyway) of course I would be grateful!


Edit 2: I don't believe there is a need necessarily to implement DDS-RPC. Standardising (de)serialisation of the service payloads should already help address the main issue.

There appear to be two options:

  1. add DDS-RPC to Cyclone, and update the RMWs to all use it for ROS 2 services
  2. standardise (de)serialisation of service payloads, and update all RMWs to use that for ROS 2 services

which one would be more efficient use of everyone's time I wouldn't know.


Edit 3:

@fujitatomoya wrote:

I would expect that DDS-based rmw implementation is interoperable between them. They are using the same protocol underneath and why it loses the DDS compatibility over ROS 2? (like rmw interoperability group)

that's indeed what users I've talked to expect. It isn't true right now though of course.

Thinking about the use cases for far edge devices, this discussion goes to interoperability with micro-ROS as well?

If service payload (de)serialisation gets standardised, I would expect rmw_microxrcedds to be updated to also be compatible with it. That would make it immediately compatible with all other implementations.

@clalancette
Copy link
Contributor

Is that true? I was pretty sure both Connext and Fast-DDS implemented the dds-rpc spec. Is that what you're talking about?

That spec allows for two different ways to implement it, and those two modes are not compatible. Also, unless something has changed services between fast-dds and connext use to work

I guess I could be wrong. I will say that we've had all of the inter-vendor service compatibility tests disabled for years: https://github.com/ros2/system_tests/blob/e482f5d4f44a64c4d3470903a60bb1ff229eec8a/test_communication/CMakeLists.txt#L276-L284

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/supported-dds-implementations/29180/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants