Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-Distro/Vendor communication in ROS 2 #3288

Closed
fujitatomoya opened this issue Jan 25, 2023 · 12 comments
Closed

Cross-Distro/Vendor communication in ROS 2 #3288

fujitatomoya opened this issue Jan 25, 2023 · 12 comments

Comments

@fujitatomoya
Copy link
Collaborator

There has been some questions and discussion related to cross-vendor and cross-distro communication support in ROS 2.

And the answer is NO for both cases. currently ROS 2 does not support any cross-distro or cross-vendor communication officially. (which is my understanding, if i am mistaken, let me know) some of these cases are implemented in test code, but that does not mean that we support these compatibility.

No matter this is being current limitation or specification, it would be probably nice to describe clearly that is not supported in ROS 2 documentation officially? So that we can avoid potential unexpected problems and questions in the future.

Adding documentation would not be good enough to guarantee to avoid these cases, maybe we can add warning if the communication takes place with different distro or vendor implementation at runtime.

This topic was from today's MW WG meeting. (CC: @wjwwood @alsora @asorbini @ros2/middleware_working_group )

Related but out of scope from this issue:

@fujitatomoya
Copy link
Collaborator Author

@clalancette
Copy link
Contributor

And the answer is NO for both cases. currently ROS 2 does not support any cross-distro or cross-vendor communication officially. (which is my understanding, if i am mistaken, let me know) some of these cases are implemented in test code, but that does not mean that we support these compatibility.

So I guess it depends on what we mean by "support". We know (and test) that we can publish topics between all 3 of our in-core DDS vendors (Fast-DDS, Connext, and CycloneDDS). I also happened to test today that we can call services between Fast-DDS and Connext (but not with CycloneDDS) (this also implies that actions should work between Fast-DDS and Connext, though I did not test that).

So with the two exceptions of services Fast-DDS <-> CycloneDDS and services Connext <-> CycloneDDS, all of the rest of it should work. The question is whether we want to officially call that supported or not.

@alsora
Copy link
Contributor

alsora commented Jan 26, 2023

FYI, as of ROS 2 Galactic sending a service request using CycloneDDS to a Fast-DDS application results in a crash during deserialization of the packet.
I think that @MiguelCompany is working on making it reject the data, rather than crash.

IMO unless we guarantee that everything works between all Tier 1 middlewares, we should discourage users to not even try to mix middlewares.
The same applies for communication between distributions.
For example, by default we could print very loud warnings if someone is mixing stuff.

An environment variable could be set by advanced users who are explicitly looking for cross-communication.

Trying to use different versions of ROS or different middlewares (by mistake!) are some of the most common causes of the issues reported by Create 3 (and Turtlebot 4) users.

@eboasson
Copy link

So for the service interoperability, there is another issue open. I've understood from @gavanderhoorn (who BTW has rightfully been bugging me about it) that it really is necessary because of some devices only supporting middleware 1 and the other only middleware 2, and yet they have to be combined into a single system. The only way forward is therefore to agree on a portable way of doing service invocations.

I promised him that I'd try to come up with a proposal that might be agreeable to both @MiguelCompany and myself — the idea being that open source tier 1 parties ought to band together, and not let the commercial people try to bully us into something we don't want.

To me, the only sensible path forward is to do something that builds on top of the (spec'd) DDS API and respects the layering at the DDS level. Cyclone's RMW implementation currently meets this requirement, the only trickery it requires is encapsulating the ROS service payload in a slightly larger message. I believe the separate service type support exists precisely to make this possible — and doing it in a cleaner way than what the Cyclone RMW layer does.

The downside on the higher levels of the stack of doing the layering properly is (or might be) that the transformation between application data and DDS data becomes more involved and that zero-copy becomes impossible. I don't think that additional overhead would be a big deal for service invocations.

On the lower level of the stack, it adds some wire overhead in that it means you can't re-use the sequence number used by the reliable protocol for the request, and that you may decide to also add a unique client identifier to the header. For the reply, I don't think there is a material wire overhead.

About the identifier: you can basically chose to reuse the identifier of, e.g., the client's writer, or you can create a new unique id for that. Reusing means you already know the GUID from the low-level protocol machinery and it is accessible from the API, so it meets my layering constraint while avoiding a few bytes of overhead. Nonetheless, adding a new unique id is, in my view, a solution that more cleanly separates the concerns and avoids some order dependencies on creating the reader/writer pair for a client.

Since I have failed to do as I promised @gavanderhoorn ... perhaps we can just use this latest trigger to finally do something about it. Of course I'd be happiest if @MiguelCompany would be happy to rework the rmw_fastrtps_cpp implementation to match Cyclone's, but that's probably a stretch and the way forward is to share the pain a bit.

I don't think this ticket is necessarily the place for that discussion. I did feel it unacceptable to not at least do something to make good on that promise of mine ...

@fujitatomoya
Copy link
Collaborator Author

it really is necessary because of some devices only supporting middleware 1 and the other only middleware 2, and yet they have to be combined into a single system.

@eboasson totally agree.

For example, by default we could print very loud warnings if someone is mixing stuff.

@alsora yeah, that was the idea. and i think documentation and warning would do some work, we user still can try with warning message if we want to.

So I guess it depends on what we mean by "support".

@clalancette i take support is if issue happens that is the bug we must fix? i think that it still makes sense to add doc like cross-distro and cross-vendor communication is not supported officially. some might work but it is anti-pattern and not recommended.

I do not have detailed implementation idea how to add the warning though...

@clalancette
Copy link
Contributor

@clalancette i take support is if issue happens that is the bug we must fix? i think that it still makes sense to add doc like cross-distro and cross-vendor communication is not supported officially. some might work but it is anti-pattern and not recommended.

See, I think that is the wrong way to go. We are essentially taking current bugs (inter-service operability between CycloneDDS and our other RMWs), and upgrading them to policy. I think we should be going the opposite way; attempting to make these interoperable (with tests), so that we can officially claim that cross-vendor compatibility is supported. If we don't allow this, then we are making it difficult to produce a robot running ROS 2 that can be communicated from outside the robot.

With that in mind, my proposal for this would be the following:

  1. Re-enable tests in https://github.com/ros2/system_tests/tree/rolling/test_communication that test cross-vendor service calls between Fast-DDS and Connext.
  2. In the special case of CycloneDDS, add a warning in rmw_cyclonedds_cpp when connecting to it from a non-CycloneDDS peer. That warning would say something like "services don't currently interoperate, this may or may not work".
  3. Update our documentation to point out that the special case of CycloneDDS doesn't work.

@fujitatomoya
Copy link
Collaborator Author

We are essentially taking current bugs (inter-service operability between CycloneDDS and our other RMWs), and upgrading them to policy.

ah, that is true, there is already issue and discussion taking place.

I think we should be going the opposite way; attempting to make these interoperable (with tests), so that we can officially claim that cross-vendor compatibility is supported.

totally good to go for cross-vendor communication. and your proposal makes sense to me to support cross-vender communication.

what about cross-distribution communication? according to current API/ABI management, i do not think this can be supported?

@eboasson
Copy link

@clalancette i take support is if issue happens that is the bug we must fix? i think that it still makes sense to add doc like cross-distro and cross-vendor communication is not supported officially. some might work but it is anti-pattern and not recommended.

See, I think that is the wrong way to go. We are essentially taking current bugs (inter-service operability between CycloneDDS and our other RMWs), and upgrading them to policy. I think we should be going the opposite way; attempting to make these interoperable (with tests), so that we can officially claim that cross-vendor compatibility is supported. If we don't allow this, then we are making it difficult to produce a robot running ROS 2 that can be communicated from outside the robot.

With that in mind, my proposal for this would be the following:

  1. Re-enable tests in https://github.com/ros2/system_tests/tree/rolling/test_communication that test cross-vendor service calls between Fast-DDS and Connext.
  2. In the special case of CycloneDDS, add a warning in rmw_cyclonedds_cpp when connecting to it from a non-CycloneDDS peer. That warning would say something like "services don't currently interoperate, this may or may not work".
  3. Update our documentation to point out that the special case of CycloneDDS doesn't work.

I do need to point out here that we (the Eclipse Cyclone DDS project) do not consider it a bug. There never was any expectation of service interoperability in the past, there never was a concrete proposal on how to implement it in an interoperable manner (other than some handwaving in some call that I recall), and we are the only ones who do it without requiring support for yet another bad specification (RPC over DDS). When all the pros and cons are considered of the various options, you'll find that that bad specification adds exactly nothing to the end result.

What we do consider a bug is that service interoperability was ignored in ROS 2 from day one. A proper path to addressing that is something we're happy to be involved in, but simply expecting us to abide by the technical choices of others where we strongly believe those choices to be bad is not the way.

If you wish to pin it on Cyclone DDS and state that it doesn’t work, i.e., “is broken”, that is your good right. It is not an attitude that would motivate us to “fix” things.

@clalancette
Copy link
Contributor

If you wish to pin it on Cyclone DDS and state that it doesn’t work, i.e., “is broken”, that is your good right. It is not an attitude that would motivate us to “fix” things.

Apologies, my intention was not to call out Cyclone DDS. From my perspective, I didn't really know that DDS didn't have a standard for a long time, and that there was contention about the standard that exists.

I'm honestly not sure of a way forward (which I guess is why we've been in this situation for a long time). It seems like Fast-DDS and Connext both implement the standard that exists (as flawed as it may be). Cyclone implements something potentially better, but which is not standardized. Some ways forward I see are:

  1. Have ROS 2 define the standard (as you suggest), and then implement it in the all of the RMWs.
  2. Have Cyclone implement the (contentious) standard as an option, either in the RMW layer or in Cyclone itself.

I take it that Erik is in favor of 1. @eboasson just for my information, is there any potential for 2? Also do you have any other ideas on a way forward here?

@fujitatomoya
Copy link
Collaborator Author

we are the only ones who do it without requiring support for yet another bad specification (RPC over DDS). When all the pros and cons are considered of the various options, you'll find that that bad specification adds exactly nothing to the end result.

@eboasson I am trying to catch up with this. this has been explained in ros2/rmw_cyclonedds#184? (which is almost 3 years old...) specifically on this comment ros2/rmw_cyclonedds#184 (comment)?

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-02-16/29927/1

@fujitatomoya
Copy link
Collaborator Author

#4736 is merged, i will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants